Print lines between (and excluding) two patterns

Question

I'm going to submit form using cURL, where some of the contents is come from other file, selected using sed

If param1 is line matching pattern from other file using sed, below command will works fine:

curl -d param1="$(sed -n '/matchpattern/p' file.txt)" -d param2=value2 http://example.com/submit

Now, go to the problem. I want show only text between 2 matching pattern excluding the matching pattern itself.

Lets say file.txt contains:

Bla bla bla
firstmatch
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
secondmatch
The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English.

Currently, lots of "beetween 2 matching pattern" sed command won't remove firstmatch and secondmatch.

I want the result to become:

It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.

http://stackoverflow.com/questions/17988756/how-to-select-lines-between-two-marker-patterns-which-may-occur-multiple-times-w — Ciro Santilli OurBigBook.com, Jul 13 '15 at 09:42

score 17 · Answer 1 · answered Jul 26 '11 at 05:07

17

Here is one way you could do it:

sed '1,/firstmatch/d;/secondmatch/,$d'

Explained: From the first line to the line matching firstmatch, delete. From the line matching secondmatch to the last line, delete.

answered Jul 26 '11 at 05:07

Jukka Matilainen

319

don_crissti · Answer 2 · 2018-03-16T12:00:45.107

9

The other sed solution will fail if firstmatch occurs on the 1st line¹.

Keep it simple, use a single range and an empty² regex:
either print everything in that range excluding range ends (auto-printing disabled)³:

sed -n '/firstmatch/,/secondmatch/{//!p;}' infile

or, shorter, delete everything not in that range and also delete the range ends:

sed '/firstmatch/,/secondmatch/!d;//d' infile

^{1: The reason being that if the second address is a regexp, then checking for the ending match will start with the line following the line which matched the first address.

Therefore, /firstmatch/ is never evaluated for the 1st line of the input, sed will simply delete it as it matches the line number in 1,/RE/ and move on to the 2nd line where it checks if the line matches /firstpattern/}

^{2: When a REGEX is empty (i.e. //) sed behaves as if the last REGEX used in the last command applied (either as an address or as part of a substitute command) was specified.}

^{3: the ;}syntax is for modern sed implementations; with older ones use either a newline instead of the semicolon or separate expressions e.g. sed -n -e '/firstmatch/,/secondmatch/{//!p' -e '}' infile}

edited Mar 16 '18 at 12:00

answered Mar 14 '18 at 11:44

don_crissti

82,805

Can you explain what // is doing (inside the {…})? – G-Man Says 'Reinstate Monica' Mar 15 '18 at 00:50
Thanks, but you fell into my trap. I know that // means the last regular expression used; from everything that I’ve read, that should be /secondmatch/. I’ve verified through testing that your command works, and so I’ve concluded that it is functioning as /firstmatch|secondmatch/ (which you have confirmed), but I can’t find any documentation (not even the POSIX document that you linked to or the GNU sed manual) that describes this behavior. … (Cont’d) – G-Man Says 'Reinstate Monica' Mar 15 '18 at 16:43
(Cont’d) … Entertaining experiments: (I) In sed: (1) If I do /first/,4, then // acts like /first/. (2) If I do 2,/second/, then // gets a “no previous regular expression” error. (I find this a blatant failure to follow the specified behavior.) (3) Adding --posix doesn’t change either of the above. (II) In other programs: (4) In vi, after /first/,/second/, // acts like /second/ (and the other forms are also rational implementations of the documented rule). … (Cont’d) – G-Man Says 'Reinstate Monica' Mar 15 '18 at 16:43
(Cont’d) … (5) awk seems to have no notion of “the last RE used”; // refers to the non-character before or after any character. (I invite you to try echo -- | awk '{ gsub(//, "cha"); print }'.) – G-Man Says 'Reinstate Monica' Mar 15 '18 at 16:43
So, you read “the last REGEX used in the last command” as “the last REGEX(s) used in the last command” and so you (correctly) guessed that it meant /first|second/. Lucky you. I mention the other programs to demonstrate that this is not some system-wide regex convention. Whoever added it to sed didn’t bother to add it to vim, where it would have made just about as much sense. :-) ⁠ – G-Man Says 'Reinstate Monica' Mar 15 '18 at 17:13
Fascinating insight; thanks for sharing. So now we know that, while the seemingly innocuous specification sentence that you quoted is correct in a cryptic, legalese sort of way, your interpretation of it is not. On line 2 (location of the firstmatch), // is equivalent to /firstmatch/. On lines 3 and 4 (location of the secondmatch), // is equivalent to /secondmatch/. Good thing you decided to “keep it simple”; it looks like we both learned something today. :-) ⁠ – G-Man Says 'Reinstate Monica' Mar 16 '18 at 04:30
I can confirm that Busybox sed only removes /secondmatch/, and still prints /firstmatch/, using either method. Busybox v1.31.1 – dan Nov 03 '21 at 13:35

score 5 · Answer 3 · answered Jul 26 '11 at 10:43

5

In awk:

awk '
  $1 == "secondmatch" {print_me = 0}
  print_me {print}
  $1 == "firstmatch {print_me = 1}
'

answered Jul 26 '11 at 10:43

glenn jackman

85,964

Here about speeds: http://unix.stackexchange.com/a/194662/16920 – Léo Léopold Hertz 준영 Sep 11 '15 at 14:14
What about the speeds? – glenn jackman Sep 11 '15 at 15:45
1

I think SED is here superior in contrast to AWK in time. – Léo Léopold Hertz 준영 Sep 11 '15 at 16:08

Print lines between (and excluding) two patterns

3 Answers3

Linked

Related