2

I'm using awk '{ gsub(/BAR|WIBBLE/, "FOO"); print }' to replace text in data like:

SOMETHING [BAR, WIBBLE]
SOMETHING [BAR]

This gives the desired result of:

SOMETHING [FOO, FOO]
SOMETHING [FOO]

But now I've had to update the text that requires replacing to be something like:

awk '{ gsub(/BAR|WIBBLE|ME/, "FOO"); print }'

Which turns text like:

SOMETHING [ME, WIBBLE]

into:

SOFOOTHING [FOO, FOO]

How can I limit my replacement to just the text between the brackets (i.e. leave the SOMETHING alone)?

EDIT

I also need robustness across whatever text SOMETHING might be (e.g. SHE GAVE ME THAT shouldn't have ME replaced).

Chris
  • 269

4 Answers4

5

Must that be awk? Is much easier in other languages where the substitution's replacement part can be a function call. For example perl:

perl -pe 'sub c{$s=shift;$s=~s/BAR|WIBBLE|ME/FOO/g;$s}s/\[.*?\]/c$&/ge' 
manatwork
  • 31,277
  • Thanks! That did it. @choroba and @Guru word boundary would have been great too, but I want to be robust in what text the SOMETHING should be. I'll update the question to make that clearer. – Chris Oct 24 '12 at 16:30
3

With GNU awk you can set RS to the contents of brackets and then do the replacement on RT (matched record separator):

awk -v RS='\\[[^]]*\\]' '{ gsub(/\<(BAR|WIBBLE|ME)\>/, "FOO", RT); printf "%s%s", $0, RT }' infile

infile:

cat << EOF > infile
SHE GAVE ME THAT
SOMETHING [ME, WIBBLE, SOMMER]
EOF

output:

SHE GAVE ME THAT
SOMETHING [FOO, FOO, SOMMER]
Thor
  • 17,182
2

Awk lacks backreferences in regular expression replacements, so it can't easily do replacements in context. Sed can do it:

sed -e 's/\(\[[^]]*\)BAR/\1FOO/' 's/\(\[[^]]*\)ME/\1FOO/'

If your sed supports alternations in regexps:

sed -e 's/\(\[[^]]*\)\(BAR\|ME\)/\1FOO/'

This only handles a single replacement inside each bracket pair, even with the g suffix, because [^]]* matches the longest close-bracket-free sequence. To replace all of them, use an explicit loop; note that this only works if FOO isn't a substring of BAR or ME.

sed -e ': a' -e 's/\(\[[^]]*\)BAR/\1FOO/' -e 't a' \
             -e 's/\(\[[^]]*\)ME/\1FOO/' -e 't a'

If you need something more complicated, use perl.

-3
awk '{ gsub(/\bBAR\b|\bWIBBLE\b|\bME\b/, "FOO"); print }'
Guru
  • 5,905