1

I have a file with records (lines) having two types of field delimiters | and ! as given below:

Name|Age|Physics|Chemistry|Maths|English|Batch!Year!AdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS!2021!1001!A!75
Student2|72|63|60|50|75|EWS!2021!1002!A!85
Student3|72|63|60|50|75|EWS!2021!1002!A!85

How to merge Batch, Year and AdmisnNo fields as given below?

Note, for brevity I have shown a small list of useful fields, where as my real files have many such related fields. This field where I want to remove two or three ! marks is not the last one and can be any field (6 or 7 ) from a total number of fields around 49.

Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
    Student1|81|65|70|80|88|EWS20211001!A!75
    Student2|72|63|60|50|75|EWS20211002!A!85
    Student3|72|63|60|50|75|EWS20211002!A!85

I requested awk, however any reasonably standard command is welcome.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287

6 Answers6

1
$ cat in | while read -r line ; do line="${line/\!/}" ; echo "${line/\!/}"; done
Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85
JdeHaan
  • 934
1
$ awk -F '|' 'BEGIN { OFS = FS } { sub("!", "", $NF); sub("!", "", $NF) }; 1' file
Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85

This uses awk to delete the first two ! characters from the last |-delimited field of the input.

Use any number in place of NF in the awk code to affect some other field than the last field.


Assuming only the last field contains ! characters, using sed:

$ sed -e 's/!//' -e 's///' file
Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85

This removes the first ! on each line. It then performs exactly the same substitution a second time, removing the second ! too.


Reversing each line and removing the 3rd ! twice in a row, then reversing the resulting line again. This allows other |-delimited fields to also contain ! characters.

$ rev file | sed -e 's/!//3' -e 's///3' | rev
Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85
Kusalananda
  • 333,661
  • Thanks @kusalananda is it possible to select field from first say 6th field with | delimiter. Is is possible to remove either two or three ! marks. Please help and appreciate your time for helping – Rama Krishna Majety May 08 '22 at 15:37
  • 2
    @RamaKrishnaMajety You appear to be changing the question. – Kusalananda May 08 '22 at 15:37
  • I have a list of such files and are quite unorganised, from various sources, my question is to select a | delimited field which has subfields ! and merge them. If possible kindly help me. – Rama Krishna Majety May 08 '22 at 15:39
  • 1
    @RamaKrishnaMajety This is not mentioned in the question. – Kusalananda May 08 '22 at 15:42
  • Thanks I am running awk one liner. Thanks for the support. – Rama Krishna Majety May 08 '22 at 15:53
  • 1
    @RamaKrishnaMajety regarding my question is to select a | delimited field which has subfields ! and merge them - that is definitely not the question you asked. You asked about a file with records (lines) having two types of field delimiters | and ! and how to merge three fields, not a file that has 1 type of delimiter (|) but where each field could have sub-fields delimited by !s and how to merge some of those subfields within a field - that's a very different situation and you should ask a new question about THAT if that's what you really have. – Ed Morton May 08 '22 at 16:58
  • What is the difference between s/\([^!]*\)!/\1/ and a simple s/!//? – Philippos May 09 '22 at 07:44
  • @Philippos There is absolutely no difference. I probably had too many other things to think about yesterday. Thanks, I will update my code. – Kusalananda May 09 '22 at 08:16
1

This task is good for sed, without splitting fields, to merge the 7th field with the next one:

sed -E 's/|\|!//7' file

Running once more will merge the 7th (initially the 8th) with its next one. Totally:

sed -Ee 's/|\|!//7' -Ee 's/|\|!//7' file

Or more shortly (suggested by Philippos), as the second substitution will happen if the first did:

sed -E 's/\||!//7;s///7' file

Also -E is used for portability, meaning for extended regular expressions.

Output:

Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85

Note that after the first substitution, the 8th field became 7th, so we use 7 again for it. It's the same like doing sed '' file | sed ''.

Also it is convenient about the different field separators you have here, and can be adjusted for merging almost any neighboring fields.

thanasisp
  • 8,122
  • 1
    (1) I don't know that syntax, it seems like you expect \| as alternation symbol like | in extended regular expressions. To stay portable, you can use sed -E 's/\||!//7'. (2) Instead of repeating a pattern, better use an empty pattern to make visible that there is nothing new: sed -E 's/\||!//7;s///7' – Philippos May 09 '22 at 07:41
  • @Philippos I like (2), the second empty expr is evaluated to true again when the previous one was true. And yes, the man page also includes -E for portability for EREs but I never include it, I 'll update. Thanks. – thanasisp May 09 '22 at 08:16
1

Using GNU awk for the 4th arg to split():

$ awk '{n=split($0,f,/[|!]/,s); s[7]=s[8]=""; for (i=1;i<=n;i++) printf "%s%s", f[i], s[i]; print ""}' file
Name|Age|Physics|Chemistry|Maths|English|BatchYearAdmisnNo!Grade!Score
Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85
Ed Morton
  • 31,617
  • 1
    This a good demo for the split() ability to handle the array of separators. And it scales for field numbers and separators. – thanasisp May 09 '22 at 08:42
1

Worked for above example and got desired output

sed -e 's/!//1' -e 's/!//1' file.txt

output

Student1|81|65|70|80|88|EWS20211001!A!75
Student2|72|63|60|50|75|EWS20211002!A!85
Student3|72|63|60|50|75|EWS20211002!A!85
1

Here's just one possible perl solution

perl -pe '@a = split /[|!]/; $_ = join "|", @a[0..5], join("!", join("", @a[6..8]), @a[9,10]);' file

written with a series of joins that so if your use case gets more complicated, say you want to drop Maths, you just drop that index (4)

perl -pe '@a = split /[|!]/; $_ = join "|", @a[0..3,5], join("!", join("", @a[6..8]), @a[9,10]);'

It seemed neater than concatenating fields and splicing the extra fields out of the array before joining back together.

Boyd
  • 173