When using awk /pattern/ { print "text"} /patern/ {print ""} is there an ELSE pattern?

Question

Let's say I have text file like:

R1 12 324 3453 36 457 4 7 8
R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242

I want to use awk to process these lines differently, like

awk '/R1/ { print "=>" $0} /R2/ { print "*" $0} '

and I want to also print all the rest of the lines as they are (without making duplicates of the lines I've already processed), basically I need an /ELSE/ { print $0} at the end of my awk line.

Is there such a thing?

score 27 · Accepted Answer · edited Aug 29 '12 at 21:56

Simplified Approach with awk

awk '/R1/ {print "=>" $0;next} /R2/{print "*" $0;next} 1' text.file

[jaypal:~/Temp] cat text.file 
R1 12 324 3453 36 457 4 7 8
R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242

[jaypal:~/Temp] awk '/R1/ { print "=>" $0;next} /R2/{print "*" $0;next}1' text.file
=>R1 12 324 3453 36 457 4 7 8
*R2 34 2342 2525 25 25 26 26 2 2
R3 23 2342 32 52 54 543 643 63
R4 25 234 2342 4 234242
[jaypal:~/Temp]

Breakout of Pattern {Action} Statements:

/R1/ { print "=>" $0;next} : This means lines having /R1/ the action of printing => will be done. next means the rest of the awk statements will be ignored and next line will be looked at.
/R2/{print "*" $0;next} : This means lines matching the pattern /R2/ the action of printing * will be done. When awk processing starts, the first pattern {action} statement will be ignored as the pattern /R1/ will not be true for lines having /R2/. So second pattern {action} statement will done on the line. next would again mean that we don't want any more processing and awk will duly go to the next line.
1 prints all lines. When just a condition is supplied with no {action}, awk defaults to using {print}. Here the condition is 1 which is interpreted as true, so it always succeeds. If we get to this point, it's because the first and second pattern {action} statements were ignored or by-passed (for lines not containing /R1/ and /R2/), so the default print action will be done for the remaining lines.

Seems to marginally run the fastest out of all the solutions posted. — Chris Down, Nov 28 '11 at 21:50
I'm not sure syntactic sugar is the right term here... It's just syntax. — Daniel Hershcovich, Nov 30 '11 at 11:40

Chris Down · Answer 2 · 2011-11-28T19:57:59.510

7

awk implements the usual suspects when it comes to conditionals. It's a good idea to use printf instead of print for the job which you're wanting to do on match.

awk '{ if (/^R1/) { printf("=> %s\n", $0) } else if (/^R2/) { printf("* %s\n", $0) } else { print $0 } }'

edited Nov 28 '11 at 19:57

answered Nov 28 '11 at 19:35

Chris Down

125,559
25
270
266

You don't really need if-then-else for this. – jaypal singh Nov 28 '11 at 21:44
1

While this works perfectly well, it is not idiomatic. The judicious use of next is a important tool in awk programing. – dmckee --- ex-moderator kitten Nov 28 '11 at 23:09
2

I don't understand the point of using printf here. Its only advantage (unless you're doing fancier formatting than concatenation) is that it doesn't add a newline, which is not relevant here. – Gilles 'SO- stop being evil' Nov 28 '11 at 23:43
1

That's a counterintuitive and surprising result. Unadorned print only has to output $0 whereas printf has to parse a format string. – jw013 Aug 29 '12 at 21:58

Alex Dupuy · Answer 3 · 2011-11-28T20:06:46.450

Chris Down already showed how you can get an else for regexps by using an explicit 'if' statement in a block. You can get also get the same effect in some other ways, although his solution is probably better.

One is to write a third regex that will only match text not matched by the others, in your case, this would look something like this:

awk '/^R1/ { print "=>" $0}
     /^R2/ { print "*" $0}
     /^[^R]/ || /^R[^12]/ { print $0 } '

Note, this uses anchored regexps - the ^ at the beginning of the regexps will only match at the beginning of a line - your original patterns did not do this, which slows down the matching slightly as it will check all characters on a line rather than skipping until the next line. The third ("else") case will match a line that begins with some character that is not 'R' ([^R]) or that begins with an 'R' followed by a character that is not a '1' or '2' (R[^12]). The two different meanings of ^ are somewhat confusing, but that mistake was made a long time ago and won't be changed any time soon.

To use complementary regexps, they really need to be anchored, as otherwise the [^R] would match e.g. the 1 following it. For very simple regexps like you have, this approach can be useful, but as the regexps get more complex, this approach will become unmanageable. Instead, you can use state variables for each line, like this:

awk '{ handled = 0 }
     /^R1/ { print "=>" $0; handled = 1}
     /^R2/ { print "*" $0; handled = 1}
     { if (!handled) print $0 } '

This sets handled to zero for each new line, then to 1 if it matches either of the two regexps, and finally, if it is still zero, executes the print $0.

It should be noted that on large files both are less efficient than using conditionals (as shown here). rfile is just 10000 lines of the questioner's dataset repeated. — Chris Down, Nov 28 '11 at 19:51
if (!handled) Yuck! Use next to stop considering other actions. — dmckee --- ex-moderator kitten, Nov 28 '11 at 23:05
+1 for if (!handled). General, flexible, reusable solutions are good. What if the next person who has this question wants to do more processing after the printing? The answers with next don’t support that. — Scott - Слава Україні, Jul 01 '14 at 19:45

When using awk /pattern/ { print "text"} /patern/ {print ""} is there an ELSE pattern?

3 Answers3