23

Let's say I have text file like:

R1 12 324 3453 36 457 4 7 8
R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242

I want to use awk to process these lines differently, like

awk '/R1/ { print "=>" $0} /R2/ { print "*" $0} '

and I want to also print all the rest of the lines as they are (without making duplicates of the lines I've already processed), basically I need an /ELSE/ { print $0} at the end of my awk line.

Is there such a thing?

Ali
  • 6,943

3 Answers3

27

Simplified Approach with awk

awk '/R1/ {print "=>" $0;next} /R2/{print "*" $0;next} 1' text.file

[jaypal:~/Temp] cat text.file 
R1 12 324 3453 36 457 4 7 8
R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242

[jaypal:~/Temp] awk '/R1/ { print "=>" $0;next} /R2/{print "*" $0;next}1' text.file
=>R1 12 324 3453 36 457 4 7 8
*R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242
[jaypal:~/Temp] 

Breakout of Pattern {Action} Statements:

  • /R1/ { print "=>" $0;next} : This means lines having /R1/ the action of printing => will be done. next means the rest of the awk statements will be ignored and next line will be looked at.

  • /R2/{print "*" $0;next} : This means lines matching the pattern /R2/ the action of printing * will be done. When awk processing starts, the first pattern {action} statement will be ignored as the pattern /R1/ will not be true for lines having /R2/. So second pattern {action} statement will done on the line. next would again mean that we don't want any more processing and awk will duly go to the next line.

  • 1 prints all lines. When just a condition is supplied with no {action}, awk defaults to using {print}. Here the condition is 1 which is interpreted as true, so it always succeeds. If we get to this point, it's because the first and second pattern {action} statements were ignored or by-passed (for lines not containing /R1/ and /R2/), so the default print action will be done for the remaining lines.

jw013
  • 51,212
jaypal singh
  • 1,592
7

awk implements the usual suspects when it comes to conditionals. It's a good idea to use printf instead of print for the job which you're wanting to do on match.

awk '{ if (/^R1/) { printf("=> %s\n", $0) } else if (/^R2/) { printf("* %s\n", $0) } else { print $0 } }'
Chris Down
  • 125,559
  • 25
  • 270
  • 266
5

Chris Down already showed how you can get an else for regexps by using an explicit 'if' statement in a block. You can get also get the same effect in some other ways, although his solution is probably better.

One is to write a third regex that will only match text not matched by the others, in your case, this would look something like this:

awk '/^R1/ { print "=>" $0}
     /^R2/ { print "*" $0}
     /^[^R]/ || /^R[^12]/ { print $0 } '

Note, this uses anchored regexps - the ^ at the beginning of the regexps will only match at the beginning of a line - your original patterns did not do this, which slows down the matching slightly as it will check all characters on a line rather than skipping until the next line. The third ("else") case will match a line that begins with some character that is not 'R' ([^R]) or that begins with an 'R' followed by a character that is not a '1' or '2' (R[^12]). The two different meanings of ^ are somewhat confusing, but that mistake was made a long time ago and won't be changed any time soon.

To use complementary regexps, they really need to be anchored, as otherwise the [^R] would match e.g. the 1 following it. For very simple regexps like you have, this approach can be useful, but as the regexps get more complex, this approach will become unmanageable. Instead, you can use state variables for each line, like this:

awk '{ handled = 0 }
     /^R1/ { print "=>" $0; handled = 1}
     /^R2/ { print "*" $0; handled = 1}
     { if (!handled) print $0 } '

This sets handled to zero for each new line, then to 1 if it matches either of the two regexps, and finally, if it is still zero, executes the print $0.