How to print the text after the first occurence of a "start pattern"-"stop pattern" pair?

Question

I have a file that contains a bunch of certificates:

I want to chop the one that said A in it and just have the B C ... n certificates.

It's very similar to this question and I was hoping for a portable way to do this. Preferbly with sed if possible but awk works too if it's not possible to do with sed.

Is there a way to make sed print until a particular value EXCEPT for the first occurrence?

Welcome to the site. The answer to your question depends somewhat on the circumstances. If your problem in quesion is indeed a well-structured certificate file as shown in your example, the solution is pretty straight-forward. If not: can the "starting" pattern (here: BEGIN CERTIFICATE) occur without a corresponding closing "end pattern" (END CERTIFICATE)? If so, what would you want the solution to do? — AdminBee, Mar 28 '22 at 15:26
@StéphaneChazelas Sounds like a good answer to me, why not post it (perhaps along with some explanation for the sed novice)? ;) — AdminBee, Mar 28 '22 at 16:13

score 4 · Answer 1 · answered Mar 28 '22 at 15:28

If your input is a well-formed certificate file as shown in your example, the easiest way that comes to my mind is to do it with awk:

awk '$0=="-----BEGIN CERTIFICATE-----" {n++} n>1' test.cert

This will increase a counter variable n every time the current line ($0) matches exactly the "start pattern". It will print the current line if the "seemingly stray" boolean expression n>1 is true, i.e. beginning with the second occurence of the start pattern. awk treats uninitialized variables as zero (or the empty string, depending on usage context), so there is no need to explicitly initialize n to 0 in a BEGIN section.

Things get more complicated if your input document can be corrupted, i.e. contain start patterns that are not correctly matched by end patterns and vice versa.

Yes, it will always be well formed as in the example, as it has come from openssl directly, ie openssl s_client -showcerts -connect example.com:443 < /dev/null 2>&1 > certificates.cert — dogman, Mar 28 '22 at 15:32
@dogman In that case the proposed answer should do the trick. — AdminBee, Mar 28 '22 at 15:32

score 1 · Accepted Answer · answered Mar 28 '22 at 16:20

1

@AdminBee answer is the right way of doing that, but if you're sure that the section to omit is the first you can do that with sed as well :

sed -n '/^-*END CERTIFICATE-*$/!d;:a n;p;ba' file

Or in a multiline portable fashion :

sed -n '
    /^-*END CERTIFICATE-*$/!d;:a
    n;p;ba
' file

This sed deletes all lines till the first -----END CERTIFICATE----- found, then create a loop which will eat and print all the other lines.

answered Mar 28 '22 at 16:20

DanieleGrassini

2,824

1

Ah that is more succinct, and possibly portable too. – dogman Mar 30 '22 at 17:18
1

As it was my preference to do this with sed and i can be sure the first certificate is the one I want to omit, I'm going to mark this one as the answer. – dogman Mar 30 '22 at 17:32

guest_7 · Answer 3 · 2022-03-29T16:18:25.300

You should change the regex /BEGIN/ and /END/ to required.

awk '
/BEGIN/,/END/{
  if ( /END/ && !f++ ) next
}f
' file

perl in slurp mode (-0777)

perl -0777 -pe '
  my($b,$e) = map { quotemeta s/$/ CERTIFICATE/r } qw(BEGIN END);
  my($B,$E) = map { qr{^-+ $_ -+\n}mx } ($b,$e);
  my $re = qr{$B (?s:.*?) $E}mx;
  substr($_,0,$+[0],"") if m{$re}m;
' file

GNU sed editor

sed -ne 's/\n//;t2
  /BEGIN/!{$!N;D;}
  :1;n;/END/!b1
  n
  :2;$!{N;P;G;D;}
  p
' file

The linux utility csplit can also be used here. We first chunk the input file around the END lines. Then delete the first file or first two in case an END appears before BEGiN.

csplit -sz file '/END/+1' '{*}'
for f in xx*;do
  sed -n '/BEGIN/,/END/!d;$Q1' $f
  status=$?; rm -- "$f"
  [ " $status" = ' 1' ] && break
done
printf '%s\n' xx* | xargs -r cat

How to print the text after the first occurence of a "start pattern"-"stop pattern" pair?

3 Answers3