I need to do something like that using sed?
qq ab xyz ab qq aa ab
Becomes:
qq ab xyz +ab+ +qq+ aa +ab+
I need to do something like that using sed?
qq ab xyz ab qq aa ab
Becomes:
qq ab xyz +ab+ +qq+ aa +ab+
If your input doesn't contain <
, >
nor +
characters, you could do:
sed '
s/[[:alnum:]]\{1,\}/<&>/g;:1
s/\(<\([^>]*\)>.*\)<\2>/\1+\2+/;t1
s/[<>]//g'
If it may, you can always escape them:
sed '
s/:/::/g;s/</:{/g;s/>/:}/g
s/[[:alnum:]]\{1,\}/<&>/g;:1
s/\(<\([^>]*\)>.*\)<\2>/\1+\2+/;t1
s/[<>]//g
s/:}/>/g;s/:{/</g;s/::/:/g'
Those assume you want to do that independently on each line. If you want to do it on the whole file, you'd need to load the whole file in memory first (note that some sed
implementations have size limitations there):
sed '
:2
$!{N;b2
}
s/:/::/g;s/</:{/g;s/>/:}/g
s/[[:alnum:]]\{1,\}/<&>/g;:1
s/\(<\([^>]*\)>.*\)<\2>/\1+\2+/;t1
s/[<>]//g
s/:}/>/g;s/:{/</g;s/::/:/g'
That's going to be pretty inefficient though and would be a lot easier with perl
:
perl -pe 's/\w+/$seen{$&}++ ? "+$&+" : $&/ge'
Line-based:
perl -pe 'my %seen;s/\w+/$seen{$&}++ ? "+$&+" : $&/ge'
Here is another approach : this uses a few sed
s:
an='[:alnum:]' esc=$(printf '\033\[')
sed "/[${an}]/!d;=;a\ }
s/.*/ & /;s/[^${an}]\{1,\}/ /g
s| \([${an}"']\{1,\}\) | \
s/\\([^+'"${an}"']\\)\\(\1\\)\\([^+'"${an}"']\\)/\\1+\\2+\\3/2|g
' <text |
sed '/^ /!N;s/\n */{/' |
sed -e 's/.*/ & /;s/+/ & /g' \
-f - \
-e "s/ //;s/ $//
s/+[^+ ]\{1,\}+/${esc}38;5;35m&${esc}0m/g
s/ + /+/g" text
Basically, the first two sed
s team up to write a script for the third. The first sed
clears anything but alphanumeric characters from every line - and skips entirely any line without one. For all of the groups of characters that remain, it writes a substitution statement that the third will eventually (almost instantaneously) read and interpret as its script.
The second sed
is necessary because the first writes its script by line - and each line can have several s///
statements. The first prints its line number for each line which contains an alphanumeric, but that needs to be paired up in a function context for the third sed
- and so the second does this.
Here is a sample of what the script looks like:
...
43{
s/\([^+[:alnum:]]\)\(n\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(N\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(G\)\([^+[:alnum:]]\)/\1+\2+\3/2
}
44{
s/\([^+[:alnum:]]\)\(b\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(block\)\([^+[:alnum:]]\)/\1+\2+\3/2
}
45{
s/\([^+[:alnum:]]\)\(END\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(SEDSCRIPT\)\([^+[:alnum:]]\)/\1+\2+\3/2
}
There is a trailing 2
for each s///
- this is because each substitution is directed at the second occurrence of each pattern - if a second occurrence does not exist nothing is substituted. The above is the result of running it on another of my sed
scripts - it seems pretty immune to special characters or similar.
As I was writing I found it was easier to tell what was going on if I colored its selection - which is what...
s/+[^+ ]\{1,\}+/${esc}38;5;35m&${esc}0m/g
... that line does. You can comment or remove it you ever use this at all and don't want or need it.
Here's the script it writes for your example data:
1{
s/\([^+[:alnum:]]\)\(qq\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(ab\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(xyz\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(ab\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(qq\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(aa\)\([^+[:alnum:]]\)/\1+\2+\3/2
s/\([^+[:alnum:]]\)\(ab\)\([^+[:alnum:]]\)/\1+\2+\3/2
}
Here's what it prints:
qq ab xyz +ab+ +qq+ aa +ab+
And from some of my sed
script from before:
s/\(\(.\)${bs}\2\)\{1,\}/${esc}38;5;35m&${+esc+}0m/g
s/\(_${bs}[^_]\)\{1,\}/${esc}38;5;75m&${+esc+}0m/g
s/.${bs}//g
s/\(\(${esc}\)0m\2[^m]*+m+[_ ]\{,+2+\}\)\{+2+\}/_/g
n; /./!N;G
qq
is found on one line, should all the otherqq
s be replaced on the same line only or in the whole file?)? – Stéphane Chazelas Dec 03 '14 at 15:17