Using sed, I want to add check after nth occurrence
Input:
DCR
DCR
DCR
Output:
DCR
DCR
check
DCR
Is it possible using sed?
With GNU sed, you can replace the nth pattern in a line
$ echo "foofoofoofoo" | sed 's/foo/&\nbar/2'
foofoo
barfoofoo
But for the nth line that contains the pattern, awk is easier:
awk -v n=2 -v patt=foo '{print} $0 ~ patt && ++count == n {print "bar"}' <<END
foo1
foo2
foo3
foo4
END
foo1
foo2
bar
foo3
foo4
With GNU sed:
sed -z 's/DCR/&\ncheck/2' <input >output
For non-uptodate versions:
sed '/DCR/{p;s/.*/1/;H;g;/^\(\n1\)\{2\}$/s//check/p;d}' <input >output
If there are more than 1 occurence DCR
in line:
sed '
/DCR/{p
x # tests if already have met pattern
/^\(\n\a\)\{2\}/!{ #+apropriate times and, if so, cancel
x #+the rest of commands
s/DCR/\a/g # exchange DCR by \a symbol
s/^[^\a]*\|[^\a]*$//g # delete everything before & after it
s/[^\a]\+/\n/g # substitute everything between by \n
H
g
/^\(\n\a\)\{2\}/s/.*/check/p} # add 'check' for double pattern
d}' <input >output
-z
); the second one works OK only if there's one pattern per line (try it with a file where the first two occurrences of pattern are on the same line and the third is on another line). It is unclear though, whether the op wants to count lines matching pattern or just patterns...
– don_crissti
Apr 21 '15 at 17:46
'/DCR/{p;s/DCR/\a/g;H;g;s/\n\?[^\n\a]*/\n/g;/^\(\n\a\)\{2\}\n\?$/s/.*/check/p;d}'
where \a
is \x07
symbol (can be any which sure will not met in the text.
– Costas
Apr 21 '15 at 18:39
sed:command garbled: /DCR/{p;s/.*/1/;H;g;/^\(\n1\)\{2\}$/s//check/p;d}
– Menon
Apr 22 '15 at 11:26
;
after last d
. What version of sed
do you use? (sed --version
)
– Costas
Apr 22 '15 at 11:36
\n
ew lines instead of ;
. Other solution use script-file
if your version support -f
option. Any way try man sed
.
– Costas
Apr 22 '15 at 11:53
sed command garbled
is at least a Solaris error message. GNU sed
doesn't writw any errors like that.
– mikeserv
Jun 29 '15 at 07:09
You can do this with sed
on a stack...
sed '/match$/N
s/\n/&INSERT&/3;t
$n;N;P;D'
That would insert INSERT
following every 3rd non-sequential occurrence of match
in input. It is the most efficient way I know to do it with sed
because it does not attempt to store all lines that occur between different matches
, nor does it necessitate buffer swaps or back-ref comparisons, but instead simply increments sed
's only means of counting at all - its line-number via its line-cycle.
There is some added overhead, of course - with each match pattern space gets a little bigger - but it is still the same stream, and there is no back-tracking. It's just first-in,first-out - which, as I think, is a method very well suited to sed
. In fact, rather than going back to check for a match, sed
can advance further ahead for each match. I'm a little proud of it, and don't know why I never thought of it before.
The version above, though, would squeeze repeats to some extent because it only works one line behind input. And the solution to that is to advance still further and requires only a little additional complexity in the form of a b
ranch :l
abel short-circuit loop inside the N;P;D
loop to keep it current.
It works like this:
seq 100000| sed -ne':n
s/\n/&\tCheck&\t/5p;t
N;/0000$/bn' -eD
...which, for me, prints...
49995
49996
49997
49998
49999
Check
50000
99995
99996
99997
99998
99999
Check
100000
You see, in order to maintain the count, it increments its line-buffer for each occurrence of match
and tacks another line onto its sliding window on pattern space. In that way all that is needed to verify that the match has been found is to attempt to substitute away the s///
nth
\n
ewline character in pattern space. If it can be done, we've encountered n matches
so far, and t
est can branch us out of the current iteration and clear the increment entirely.
In the example above the buffer is incremented once for every pattern-space which ends with the string 0000
. When 5 of those are found, sed
prints the current pattern-space - and its whole buffer - and clears the counter.
For your thing:
printf DCR\\n| tee - - - - - |
sed -e:n -e's/\n/&\tCheck&\t/2;t
$n; N;/DCR$/bn' -eP\;D
DCR
DCR
Check
DCR
DCR
DCR
Check
DCR
Now, if you wanted to mark only the nth
occurrence, it's also easy:
printf DCR\\n |
tee - - - - - - - - -|
sed -e:n -e's/\n/&\tCheck&\t/3;te
$n; N;/DCR$/bn' -e'P;D;:e
n;be'
...if you really look at it, it might occur to you that we only barely scratched the surface here...
DCR
DCR
DCR
Check
DCR
DCR
DCR
DCR
DCR
DCR
DCR
t
with no label will branch to the end of script. i tried with three sed from heirloom toolchest, they worked.
– cuonglm
Jun 29 '15 at 13:34
b
ranching to the end of the script. That is what it does, of course. The end of script is not the end of file - for each line-cycle sed
reads its script all the way - well, normally. It can be done otherwise - like I did in that !!
answer where the script is read from start to finish in tandem with the infile. Anyway, when you b
ranch to end of script you b
ranch out to the next line-cycle to try the script again.
– mikeserv
Jun 29 '15 at 14:45
-e
part associated with t
command. My bad!
– cuonglm
Jun 29 '15 at 15:50
sed
concatenates all of its scripts into a single one before ever getting started - so by the time it starts executing there only ever is the one.
– mikeserv
Jun 29 '15 at 16:13
sed
way is cool and sometime, it's hard to me. I always learn some things new in your sed
answers.
– cuonglm
Jun 29 '15 at 18:08
I don't have a direct answer in sed. In awk, on the other hand, it is easy:
echo -e "DCR\nDCR\nDCR" |\
awk 'BEGIN {t=0}; { print }; /DCR/ { t++; if ( t==2) { print "check" } }'
awk 'BEGIN {t=0}; { print }; /DCR/ { t++; if ( t==2) { print "check" } }' file > newfile
But I am getting error:awk: syntax error near line 1 awk: bailing out near line 1
– Menon
Apr 22 '15 at 11:31
sed
is not well suited for this task, but of course you can still do it. Here is one way that saves a string that is n
long in the hold-space, and uses that to count the number of DCR
occurrences:
n=2
((yes | head -n$n | tr -d \\n; echo); cat infile) |
sed '
1 {h;d} # save counting string
/DCR/ { #
x; s/.//; x # n--
T chk # if n=0 goto "chk"
}
P;D
:chk # insert check
i\check
:a; N; ba # print rest of file
'
As noted by glenn, awk is much cleaner, here is a golfed version, but similar logic:
<infile awk '!n { print "check" } /DCR/ { n-- } 1' n=2
sed '2 a\
check
' file
Append after line 2 with a newline then add the word "check" with another newline and print the whole file to standard out.
AWK solution is a lot easy to read for this kind of tasks, here is just a correction to steviethecat's solution (the ; won't work for awk, need to replace it with a newline):
echo -e "DCR\nDCR\nDCR" | awk 'BEGIN {t=0}
{ print }
/DCR/ { t++; if ( t==2) { print "check" } }'
sed
is Turing complete, so it is possible. But something else, likeawk
orperl
might be more suited to this task. – muru Apr 21 '15 at 15:05