How to print match pattern using sed/awk?(I was able to do this with grep)

Question

I need to print only the 11 tag using awk or sed only with WHILE loop.

Order:479959,60=20130624-09:45:02.046|35=D|11=884|38=723|21=1|1=30532|10=085|59=0|114=Y|56=MBT|40=1|43=Y|100=MBTX|55=/GCQ3|49=11342|54=1|8=FIX.4.4|34=388|553=2453|9=205|52=20130624-09:45:02.046|
Order:24780,100=MBTX|43=Y|40=1|34=388|553=2453|52=2013062409:45:02.046|9=205|49=11342|54=1|8=FIX.4.4|55=/GCQ3|11=405|35=D|60=20130624-09:45:02.046|56=MBT|59=0|114=Y|10=085|21=1|38=470|1=30532|
Order:799794,55=/GCQ3|49=11342|54=1|8=FIX.4.4|34=388|553=2453|9=205|52=2013062409:45:02.046|40=1|43=Y|100=MBTX|38=350|21=1|1=30532|10=085|59=0|114=Y|56=MBT|60=20130624-09:45:02.046|35=D|11=216|
Order:72896,11=735|35=D|60=2013062409:45:02.046|56=MBT|59=0|114=Y|10=085|1=30532|38=17|21=1|100=MBTX|43=Y|40=1|553=2453|9=205|52=20130624-09:45:02.046|34=388|8=FIX.4.4|54=1|49=11342|55=/GCQ3|

The output should be like this:-

Orderid-479959 38= 723 Clientid=884
Orderid-24780 38= 470 Clientid=405
Orderid-799794 38= 350 Clientid=216

@SonalAsija That grep pattern would also match 111=. Better use [|,]11=[^|]+ if you're doing it that way. — Kusalananda, Jan 03 '17 at 16:22
Your requirement, "print only the 11 tag" doesn't match the desired output. Please [edit] your question to make sure you clearly state what is actually required. — Chris Davies, Jan 05 '17 at 10:22

Kusalananda · Answer 1 · 2017-01-03T22:02:35.793

No need for a loop:

$ sed 's/^.*[,|]11=\([^|]*\).*$/client id = \1/' data.in
client id = 884
client id = 405
client id = 216
client id = 735

The editing script will look for the 11 tag (11= preceded by either | or ,), and replace the whole line with the text client id = followed by the number after the 11= (actually anything following the 11= up to a | or end of line).

UPDATE (after new problem spec.):

This is uglyscript.sh (requires GNU sed and GNU awk):

#!/bin/sh
tr ',|' '\n' |
awk -vRS="\n\n" '{ print | "sort -r"; close("sort -r") }' |
tr '\n' '|' |
sed 's/|Order/\nOrder/g' |
sed 's/^Order:\([^|]*\).*|\(38=[^|]*\).*|11=\([^|]*\).*$/Orderid-\1 \2 Clientid=\3/'
echo

The first tr turns all rows in the input data into one column. The original lines are separated by a blank line (two newlines) in its output.
The awk sorts each set of lines separately in reverse lexicographical order (so that "Order" comes first).
The second tr, together with the following sed puts the lines back together again, but now the columns are in a sorted order. The tr just replaces all newlines with a | character, while the sed breaks the lines wherever the string |Order is found.
The last sed is similar to my original solution, but just captures a few more things from the lines.
The echo at the end just ensures that there is a newline at the end of the output.

Steps 1 to 3 above are necessary since the columns are not sorted. The column containing 11= can come anywhere on the line, for example, which makes just running it through a single sed script very difficult.

The data, after step 3, looks like this:

Order:479959|9=205|8=FIX.4.4|60=20130624-09:45:02.046|59=0|56=MBT|55=/GCQ3|553=2453|54=1|52=20130624-09:45:02.046|49=11342|43=Y|40=1|38=723|35=D|34=388|21=1|1=30532|11=884|114=Y|10=085|100=MBTX
Order:24780|9=205|8=FIX.4.4|60=20130624-09:45:02.046|59=0|56=MBT|55=/GCQ3|553=2453|54=1|52=2013062409:45:02.046|49=11342|43=Y|40=1|38=470|35=D|34=388|21=1|1=30532|11=405|114=Y|10=085|100=MBTX
Order:799794|9=205|8=FIX.4.4|60=20130624-09:45:02.046|59=0|56=MBT|55=/GCQ3|553=2453|54=1|52=2013062409:45:02.046|49=11342|43=Y|40=1|38=350|35=D|34=388|21=1|1=30532|11=216|114=Y|10=085|100=MBTX
Order:72896|9=205|8=FIX.4.4|60=2013062409:45:02.046|59=0|56=MBT|55=/GCQ3|553=2453|54=1|52=20130624-09:45:02.046|49=11342|43=Y|40=1|38=17|35=D|34=388|21=1|1=30532|11=735|114=Y|10=085|100=MBTX|

Running it:

$ ./uglyscript.sh <data.in
Orderid-479959 38=723 Clientid=884
Orderid-24780 38=470 Clientid=405
Orderid-799794 38=350 Clientid=216
Orderid-72896 38=17 Clientid=735

@SonalAsija Not really: somecommand | sed '(the sed script)' >outputfile — Kusalananda, Jan 03 '17 at 16:23
@SonalAsija /client id = \1/ is the replacement in s/pattern/replacement/. It means that the pattern should be replaced by the text client id = followed by whatever is captured by the first group. The first (and only) group is the $ ... $ in the pattern. It captures the actual client ID, i.e. everything after 11= up to the following | or newline. — Kusalananda, Jan 03 '17 at 17:17
my professor told me to use only While loop and the output should be:- Orderid-479959 38= 723 Clientid=884, Orderid-24780 38= 470 Clientid=405,Orderid-799794 38= 350 Clientid=216 ... any help will be appreciated — Sonal, Jan 03 '17 at 20:27
@SonalAsija Imposing a ridiculous restriction like that only makes the problem more difficult. The while loop is implicit in the sed command (it is looping over the lines). I will modify my answer, but I will not add a while loop when none is needed. — Kusalananda, Jan 03 '17 at 20:48
Why? Awk was part of the original problem specification. Why do you want to make it more complicated? If your professor is giving you an assignment, with restrictions like that, then I would ask him if he's ever seen a Unix systems with only sed installed on it. — Kusalananda, Jan 03 '17 at 22:54
@SonalAsija I'll add that the reason I'm refusing to use a while loop is that when you start getting gigabyte datasets, looping over the rows with a shell while loop is going to be rather slow, to say the least. It's bad enough to start a sort process for each line. Just add a few thousand lines of data and you'll see the script grind slowly through it. If this was a professional problem, I'd get you to submit a bug report to the person who wrote the code that created the data. That code is broken. — Kusalananda, Jan 03 '17 at 23:16

score 0 · Accepted Answer · edited Apr 13 '17 at 12:36

A "clean" awk solution

Some awk one-shot command with formatted output version if you're interested (although this looks like a job well suited for sed):

awk -F'\\||,' '{
                   for (i=1;i<NF+1;i++) {
                     if ($i ~ /11=.*/) {
                       split($i, a, "=")                           
                     }
                     if ($i ~ /Order:.*/) {
                       split($i, b, ":")
                     }
                     if ($i ~ /38=.*/) {
                       split($i, c, "=")
                     }
                   }
                   printf "Orderid-%-10s 38= %-8s Clientid=%s\n", b[2], c[2], a[2]
                 }' < infile.txt

If you're so keen on not using awk, sed or tr, and absolutely want a shell while loop, please be advised as already said in comments that this is a very bad practice to have. There is extensive explanation of why it is so bad here.

The "don't do it" solution

Now that we have made this little disclaimer, here is a way to achieve your output using only bash string manipulation within a while loop (script form, and of course it only works in bash):

while read line; 
do
  x=${line#*11=}
  x=${x%%|*}
  y=${line#*:}
  y=${y%%,*}
  z=${line#*38=}
  z=${z%%|*}
  echo "Orderid-$y 38= $z Clientid=$x"
done < infile.txt

In your particular example it works, but please don't do this in a "real life" situation. The basic idea in any shell is: "the least calls to external tools, the better". So ideally if you can do the job in one call like in my awk example, do it. Awk will be loaded once and then the whole job is done in C, which is lightning fast compared to shell.

How string manipulation in bash works in my answer

${string#pattern}: start from the left-hand side of the string, and deletes the shortest match for patern. So if you put a pattern like *a for example, everything up to the first "a" character (included) will be removed from your string. If you use the same syntax but with 2 "#", the match for pattern will become as greedy as possible, and remove everything up to the last "a" character in your string. Example:
```
$ test="alakazam"; echo ${test#*a}; echo ${test##*a};
lakazam
m
```

${string%pattern}: works the same, but from the right-hand side. To illustrate with previous example:

$ test="alakazam"; echo ${test%a*}; echo ${test%%a*};
alakaz
             #no output here: the whole string is matched by pattern

Thanks... could you plz explain me x=${line##11=} x=${x%%|} how its working? — Sonal, Jan 04 '17 at 14:44

How to print match pattern using sed/awk?(I was able to do this with grep)

2 Answers2