4

Using sed, I want to add check after nth occurrence

Input:

DCR
DCR
DCR

Output:

DCR
DCR
check
DCR

Is it possible using sed?

Menon
  • 305

7 Answers7

5

With GNU sed, you can replace the nth pattern in a line

$ echo "foofoofoofoo" | sed 's/foo/&\nbar/2'
foofoo
barfoofoo

But for the nth line that contains the pattern, awk is easier:

awk -v n=2 -v patt=foo '{print} $0 ~ patt && ++count == n {print "bar"}' <<END
foo1
foo2
foo3
foo4
END
foo1
foo2
bar
foo3
foo4
glenn jackman
  • 85,964
4

With GNU sed:

sed -z 's/DCR/&\ncheck/2' <input >output

For non-uptodate versions:

sed '/DCR/{p;s/.*/1/;H;g;/^\(\n1\)\{2\}$/s//check/p;d}' <input >output

If there are more than 1 occurence DCR in line:

sed '
/DCR/{p
      x                               # tests if already have met pattern
      /^\(\n\a\)\{2\}/!{              #+apropriate times and, if so, cancel
        x                             #+the rest of commands
        s/DCR/\a/g                    # exchange DCR by \a symbol
        s/^[^\a]*\|[^\a]*$//g         # delete everything before & after it  
        s/[^\a]\+/\n/g                # substitute everything between by \n
        H
        g
        /^\(\n\a\)\{2\}/s/.*/check/p} # add 'check' for double pattern
      d}' <input >output
Costas
  • 14,916
  • The first one is OK (+1 for using -z); the second one works OK only if there's one pattern per line (try it with a file where the first two occurrences of pattern are on the same line and the third is on another line). It is unclear though, whether the op wants to count lines matching pattern or just patterns... – don_crissti Apr 21 '15 at 17:46
  • @don_crissti It is rather different what OP asked but if you wants '/DCR/{p;s/DCR/\a/g;H;g;s/\n\?[^\n\a]*/\n/g;/^\(\n\a\)\{2\}\n\?$/s/.*/check/p;d}' where \a is \x07 symbol (can be any which sure will not met in the text. – Costas Apr 21 '15 at 18:39
  • @Costas I am getting below error for second command: sed:command garbled: /DCR/{p;s/.*/1/;H;g;/^\(\n1\)\{2\}$/s//check/p;d} – Menon Apr 22 '15 at 11:26
  • @Menon Try to add ; after last d. What version of sed do you use? (sed --version) – Costas Apr 22 '15 at 11:36
  • @Costas Now there is no error but there is no change in output. The version extension doesnot work in the unix environment I use. But it is pretty old version. – Menon Apr 22 '15 at 11:48
  • @Menon Try to divide script by \new lines instead of ;. Other solution use script-file if your version support -f option. Any way try man sed. – Costas Apr 22 '15 at 11:53
  • 1
    sed command garbled is at least a Solaris error message. GNU sed doesn't writw any errors like that. – mikeserv Jun 29 '15 at 07:09
3

You can do this with sed on a stack...

sed '/match$/N
     s/\n/&INSERT&/3;t
     $n;N;P;D'

That would insert INSERT following every 3rd non-sequential occurrence of match in input. It is the most efficient way I know to do it with sed because it does not attempt to store all lines that occur between different matches, nor does it necessitate buffer swaps or back-ref comparisons, but instead simply increments sed's only means of counting at all - its line-number via its line-cycle.

There is some added overhead, of course - with each match pattern space gets a little bigger - but it is still the same stream, and there is no back-tracking. It's just first-in,first-out - which, as I think, is a method very well suited to sed. In fact, rather than going back to check for a match, sed can advance further ahead for each match. I'm a little proud of it, and don't know why I never thought of it before.

The version above, though, would squeeze repeats to some extent because it only works one line behind input. And the solution to that is to advance still further and requires only a little additional complexity in the form of a branch :label short-circuit loop inside the N;P;D loop to keep it current.

It works like this:

seq 100000| sed -ne':n
            s/\n/&\tCheck&\t/5p;t
            N;/0000$/bn'  -eD

...which, for me, prints...

49995
49996
49997
49998
49999
    Check
    50000
99995
99996
99997
99998
99999
    Check
    100000

You see, in order to maintain the count, it increments its line-buffer for each occurrence of match and tacks another line onto its sliding window on pattern space. In that way all that is needed to verify that the match has been found is to attempt to substitute away the s///nth \newline character in pattern space. If it can be done, we've encountered n matches so far, and test can branch us out of the current iteration and clear the increment entirely.

In the example above the buffer is incremented once for every pattern-space which ends with the string 0000. When 5 of those are found, sed prints the current pattern-space - and its whole buffer - and clears the counter.

For your thing:

printf DCR\\n| tee - - - - - |
sed -e:n -e's/\n/&\tCheck&\t/2;t
     $n;    N;/DCR$/bn' -eP\;D

DCR
DCR
    Check
    DCR
DCR
DCR
    Check
    DCR

Now, if you wanted to mark only the nth occurrence, it's also easy:

printf DCR\\n        |
tee - - - - - - - - -|
sed -e:n -e's/\n/&\tCheck&\t/3;te
     $n;  N;/DCR$/bn' -e'P;D;:e
     n;be'

...if you really look at it, it might occur to you that we only barely scratched the surface here...

DCR
DCR
DCR
    Check
    DCR
DCR
DCR
DCR
DCR
DCR
DCR
mikeserv
  • 58,310
  • Does this work with POSIX sed? The spec said t with no label will branch to the end of script. i tried with three sed from heirloom toolchest, they worked. – cuonglm Jun 29 '15 at 13:34
  • 1
    @cuonglm - yes, it works. I tested it with those as well. But you've maybe misunderstood the statement branching to the end of the script. That is what it does, of course. The end of script is not the end of file - for each line-cycle sed reads its script all the way - well, normally. It can be done otherwise - like I did in that !! answer where the script is read from start to finish in tandem with the infile. Anyway, when you branch to end of script you branch out to the next line-cycle to try the script again. – mikeserv Jun 29 '15 at 14:45
  • Well, I really misunderstood the spec, but not like you thought. I think end of script mean the end of the -e part associated with t command. My bad! – cuonglm Jun 29 '15 at 15:50
  • 1
    @cuonglm - oh yeah. that makes sense. but sed concatenates all of its scripts into a single one before ever getting started - so by the time it starts executing there only ever is the one. – mikeserv Jun 29 '15 at 16:13
  • @cuonglm - you didn't like this answer or something? i only found out i could do this yesterday. This is a cool answer. – mikeserv Jun 29 '15 at 18:04
  • 1
    No, I really like this answer, I just try myself to figure out all of the part in answer. Thinking sed way is cool and sometime, it's hard to me. I always learn some things new in your sed answers. – cuonglm Jun 29 '15 at 18:08
  • @cuonglm - well, i always do too - that's i do them. Its why i do any of it. – mikeserv Jun 29 '15 at 18:37
2

I don't have a direct answer in sed. In awk, on the other hand, it is easy:

echo -e "DCR\nDCR\nDCR" |\
awk 'BEGIN {t=0}; { print }; /DCR/ { t++; if ( t==2) { print "check" } }'
  • I used this command in bash script: awk 'BEGIN {t=0}; { print }; /DCR/ { t++; if ( t==2) { print "check" } }' file > newfile But I am getting error:awk: syntax error near line 1 awk: bailing out near line 1 – Menon Apr 22 '15 at 11:31
2

GNU sed

sed is not well suited for this task, but of course you can still do it. Here is one way that saves a string that is n long in the hold-space, and uses that to count the number of DCR occurrences:

n=2

((yes | head -n$n | tr -d \\n; echo); cat infile) | 
sed '
  1 {h;d}            # save counting string
  /DCR/ {            #
    x; s/.//; x      # n--
    T chk            # if n=0 goto "chk"
  }
  P;D 
  :chk               # insert check
  i\check
  :a; N; ba          # print rest of file
'

awk

As noted by glenn, awk is much cleaner, here is a golfed version, but similar logic:

<infile awk '!n { print "check" } /DCR/ { n-- } 1' n=2
Thor
  • 17,182
0
    sed '2 a\
    check
    ' file

Append after line 2 with a newline then add the word "check" with another newline and print the whole file to standard out.

fd0
  • 1,449
-1

AWK solution is a lot easy to read for this kind of tasks, here is just a correction to steviethecat's solution (the ; won't work for awk, need to replace it with a newline):

echo -e "DCR\nDCR\nDCR" | awk 'BEGIN {t=0}

{ print }

/DCR/ { t++; if ( t==2) { print "check" } }'
  • Welcome to U&L.SE. Please explain why the correction is needed, and you may get an upvote. – eyoung100 Jun 29 '15 at 02:20
  • The one steviethecat posted has problem, and an user has used it and got error. – Roger Freeman Jun 29 '15 at 04:19
  • Update your Answer by Clicking Edit... Don't tell us in a comment, i.e explain what the error the user got, and then tell us how that code fixes it. You've done half of that, but posting an untested code blob is discouraged. – eyoung100 Jun 29 '15 at 04:26