3

AWK don't match line feed \n on gsub().
Want change only the 'strawberry' fruit entry below.
Would be great with sed too.

It ignores \n as you can see on this script:

cat << 'EOF' > ~/src
apple
fruit
strawberry
fruit
orange
fruit
blackberry
EOF
cat << 'EOF' > ~/scp.sh
#!/bin/bash
cat ~/src | awk '{ gsub("strawberry\nfruit", "strawberry\nfruitIsRed"); print }' > ~/trg
EOF
sh ~/scp.sh
diff ~/src ~/trg
# files are equals
αғsнιη
  • 41,407
logan46
  • 33
  • cat ~/src | awk -v RS= '{ gsub("strawberry\nfruit", "strawberry\nfruitIsRed"); print }' > ~/trg – logan46 Mar 17 '21 at 18:02

1 Answers1

5

because awk's default RS (Record Separator) is a \newline, so awk never sees \newline anywhere to replace it and for that you need set RS to something else in order to recognize newline character, one way is setting RS to Null string (means records are separated with empty lines instead of newline):

awk -v RS= '{ "do stuffs" }'

and so it's not specific for gsub() only but RS dependent.

αғsнιη
  • 41,407
  • what 'RS dependent' mean, related with sed too ? – logan46 Mar 17 '21 at 18:04
  • means whatever RS value takes, you cannot match that character (or regex) in your awk code, but with RT you can access its value. some sed implementation doesn't see it in LHS (matching side), but some does, some you can enter actual newline too. for more about newline in sed, see https://unix.stackexchange.com/q/114943/72456 – αғsнιη Mar 17 '21 at 18:06
  • It is worth mentioning that RS= has side effects - I'd point to this answer for an example/brief explanation. Here, it is relevant to note that printf '%s\n%s\n' a b | awk -v RS= '/a\nb/' is fine while printf '%s\n\n%s\n' a b | awk -v RS= '/a\n\nb/' won't give the likely expected result. – fra-san Mar 17 '21 at 21:29
  • @fra-san that's what I mentioned here in advance https://unix.stackexchange.com/questions/639770/awk-dont-match-line-feed-n-on-gsub/639771#comment1199127_639771, RS===RS='\n\n' – αғsнιη Mar 18 '21 at 00:36
  • @αғsнιη Oh, sorry, I didn't read carefully enough. But I also wanted to underline that RS= is "special" in that it separates records on sequences of 2 or more newlines (and also removes empty lines from the output), which may be surprising to the reader. Not really a relevant point, though. Never mind. – fra-san Mar 18 '21 at 10:30
  • 1
    RS= isn't quite == RS='\n\n'. It's close to RS='\n\n+' but not equivalent to that either (try printf '\nfoo\n\nbar\n' | awk -v RS= '{print NR, "<" $0 ">"}' vs printf '\nfoo\n\nbar\n' | awk -v RS='\n\n+' '{print NR, "<" $0 ">"}') as RS='' has rules about ignoring leading/trailing newlines, just like the default FS ignores leading/trailing blanks. Also RS="" will work in all awks while RS='\n\n+' will only work in an awk that accepts a multi-char regexp as the RS (e.g. gawk). If you're using gawk, you can alternatively consider using RS='^$' to read the whole input at once. – Ed Morton Mar 18 '21 at 21:05