How to select first occurrence between two patterns including just the first?

Question

Similar to, but not exactly like How to select first occurrence between two patterns including them... given this input file:

something P1 something
content1
content2
something P1 something
content3
content4

I need just this output:

something P1 something
content1
content2

sed -n '/something P1 something/,/something P1 something/p' input | head -n -1 — jesse_b, Jan 04 '20 at 18:40
@RonJohn Is it guaranteed that at least two lines matching the pattern will appear? — Torin, Jan 04 '20 at 18:49
@Torin yes, I think so. An answer that doesn't rely on head -n -1 would be useful, though just in case. — RonJohn, Jan 04 '20 at 18:50
@RonJohn I'm not questioning your posted example, I'm questioning your requirements that lead to the posted output given the posted input. You said you want to get content between 2 patterns. If your input file only had the first block but not the second should that first block be output or not? If the answer is it should then you do NOT want to get the text between 2 patterns and instead just want to get the text under a header. That's a different problem with a different solution but which can also be applied to the "text between 2 patterns" problem. — Ed Morton, Jan 04 '20 at 19:17
Some other missing requirements - do you want a regexp match or a string match? Do you want a whole line match or a partial line match? Do you want to include partial word matches? Hopefully that helps clarify why I'm asking for your requirements, if not - oh well, I tried, good luck. — Ed Morton, Jan 04 '20 at 19:18
@EdMorton you could have said that in the first place. Partial matching on P1 is what I really need, and the Torin answer (which exactly answers my question) was easily modified to only match strings with P1 in them. — RonJohn, Jan 04 '20 at 19:24
@RonJohn to be clear, Torins answer finds lines that match regexps with P1 in them, mine finds lines that match strings with P1 in them so you can just pick whichever satisfies whatever your requirements are. — Ed Morton, Jan 05 '20 at 20:12
@EdMorton yes, picking the most suitable answer is how SE works. — RonJohn, Jan 05 '20 at 21:22

Torin · Accepted Answer · 2020-01-04T19:04:24.943

4

An awk solution:

 awk '/^something P1 something$/{if(++i>1)exit} i' input_file

This will print the first line matching /^something P1 something$/ and all lines until either the next line matching that pattern (but not including said line) or the end of file.

edited Jan 04 '20 at 19:04

answered Jan 04 '20 at 18:59

Torin

1,703

score 3 · Answer 2 · answered Jan 04 '20 at 19:00

3

This is what I suspect you really want:

To print the first block:

$ awk '$0=="something P1 something"{c++} c==1' file
something P1 something
content1
content2

or to print the 2nd:

$ awk '$0=="something P1 something"{c++} c==2' file
something P1 something
content3
content4

and so on. Without a clear statement of requirements it's just a guess though.

answered Jan 04 '20 at 19:00

Ed Morton

31,617

1

My question clearly specifies that the output must contain content1 and content2, so it's the first block I want. However, knowing how to also get the second block is also useful (though outside the scope of the question. – RonJohn Jan 04 '20 at 19:07
Right so I showed you the idiomatic solution that does what you specifically asked for and can be used for other purposes in future. – Ed Morton Jan 04 '20 at 19:09

score 2 · Answer 3 · 2020-01-05T19:26:55.743

awk

A general solution for ith pattern block in awk is:

awk -v i=1 -v pat='something P1 something'    '$0~pat{i--}i==0'

Explanation:

-v i=1        # sets the pattern block to print (1 in this case).
-v pat='...'  # sets the regex pattern that will be tested.

$0~pat        # tests if the input line match the pattern
{i--}         # If the pattern was found, reduce the count.
i==0          # If the count has reduced to 0, print the block lines.

If the pattern that matters is only P1, then use:

awk -v i=1 -v pat='P1' '$0~pat{i--}i==0'

For a faster execution, exit when the block has ended:

awk -v i=1 -v pat='P1' '$0~pat{i--}i==0;i<0{exit}'

If you want a literal match (not a pattern), use:

awk -v i=1 -v pat='P1' '$0 == pat {i--}; i==0; i<0{exit}'

sed

To get from the first instance of one pattern to the next instance of a pattern, you can do in GNU sed:

sed -n '/something P1 something/!b;b2;:1;{/something P1 something/q;:2;p;n;b1}'

There may be some lines before the first something P1 something.
The script stops (fast) when the second pattern is found.

As both patterns (start and end) are equal, we may reduce the command to:

sed -n -e '/something P1 something/!b;b2;:1;{//q;:2;p;n;b1}'

And to make it more portable, use:

sed -n -e '/something P1 something/!{b' -e '};b2' -e ':1' -e '{//q;:2' -e 'p;n;b1' -e '}'

Rakesh Sharma · Answer 4 · 2020-01-05T04:42:51.867

 $ sed -ne '
     /P1/!d
     :loop
        p;n
     //!bloop
     q
 ' file

Results:

something P1 something1
content1
content2

Using the Gnu sed editor with non Posix construct Q :

$ sed -e '
   /P1/,/P1/!d
   //!{$q;b;}
   G;/\n./Q;s/\n.*//;h
' file

With Posix only constructs we do this:

 $ sed -ne '
      /P1/,/P1/!d
      //!{
        p;$q;d
      }
      G;/\n./q;s/\n.*//p;h
 ' file

With Perl :

$ perl -lne '
    next unless $e = /P1/ ... /P1/;
    $e =~ /E/ ? last : print;
' file

Yet another:

$ perl -0777 -pe '$_ = /^(.*?P1(?s:.*?\n))(?=.*?P1)/m ? $1 : $,' file

How to select first occurrence between two patterns including just the first?

4 Answers4

awk

sed