12

Using the regexp string, how can I remove all the lines before the first line that contains a match? e.g How can I change this:

lost
load
linux
loan
linux

into this:

linux
loan
linux

I tried:

echo "lost
load
linux
loan
linux" | sed -e 's/.*^li.*$//g'

but it returns this, not changing anything:

lost
load
linux
loan
linux

I'd like to make it work so that it won't output anything when there's no match.

stacko
  • 791

4 Answers4

24

One way, POSIXly:

$ echo "lost
load
linux
loan
linux" | sed -e/linux/\{ -e:1 -en\;b1 -e\} -ed

or shorter:

sed -n '/linux/,$p'

or even shorter:

sed '/linux/,$!d'

For readers who wonder why I prefer the longer over the shorter version, the longer version will only perform i/o over the rest of file, while using ranges can affect the performance if the 2nd address is a regex, and the regexes are trying to be matched more than is necessary.

Consider:

$ time seq 1000000 | sed -ne '/^1$/{' -e:1 -en\;b1 -e\}
=====
JOB sed -e '/^1$/,$d'
87%    cpu
0.11s real
0.10s user
0.00s sys

with:

$ time seq 1000000 | sed -e '/^1$/,/1000000/d'
=====
JOB sed -e '/^1$/,/1000000/d'
96%    cpu
0.24s real
0.23s user
0.00s sys

you can see the different between two versions. With complex regex, it's will be big difference.

cuonglm
  • 153,898
  • @cuonglm What is the stuff in the curly brackets doing? And the "-ed"? Could you explain? I thought the "-e"s were for chaining commands, but what they're linking here (:1, ;b1) don't look like commands? – stacko Jan 24 '16 at 12:26
  • @stacko: chain all the command together, you have sed -e '/linux/{:1;n;b1};d'. You can see, inside the curly brace is just a loop, execute n command till the end of file. I broke it into many pieces for POSIX compliant. – cuonglm Jan 24 '16 at 15:22
  • @ dave_thompson_085 - you should have just answered w/ that - its a good answer. i did! I think i might delete it though... I didnt see your comment before and it looks like cuonglms folded it in anyway - and its not like i offer much i else besides here... – mikeserv Jan 24 '16 at 16:18
  • @mikeserv: Good point, updated the answer. – cuonglm Jan 24 '16 at 16:19
  • @cuonglm Does the "b1" mean "go to the label 1(:1)"? The "n" is for printing pattern space, and the "d" is for deleting, right? I'm a novice and it doesn't make sense to me; where does it say it'll repeat the command n till the end of file? Does using the curly brackets make it work like that without specifying it? And what does the "d" delete? – stacko Jan 24 '16 at 18:13
  • @stacko: yes. Inside curly brace, you enter loop 1, execute n command, then branch to loop 1, then execute n command, and so on. The d command to delete all lines before we match the pattern and enter the loop – cuonglm Jan 24 '16 at 18:19
  • @cuonglm How can the d delete all before entering the loop when it's placed at the end of the code? And the d command usually deletes only the matching part, right? e.g. sed '/linux/d' Why can it delete all in this case? Sorry if this is a stupid question. – stacko Jan 24 '16 at 18:31
  • @stacko - its not stupid, but the match is inverted ! and so all lines which do not occur before the first match in input and the last line are deleted. the n isnt for printing pattern space (though it does happen by default) the n overwrites pattern space w/ the next input line if any. – mikeserv Jan 24 '16 at 19:46
  • @stacko: If there's no match, you did not enter the loop, so delete the lines. You don't have to use delete command associate with a pattern, you can simple sed d to delete all lines. – cuonglm Jan 25 '16 at 01:34
2

This is easy to do clearly in awk:

echo "lost
load
linux
loan
linux" | awk '
    /^li/ { found = 1 }
    found { print }'

Here found is a variable, with an arbitrarily chosen, self-explanatory name.  It gets set when the program encounters an input line that matches the regexp.  (Variables initially default to null, which is functionally equivalent to 0 or FALSE.)  So input lines are printed after the ^li pattern is matched, and not before.  The third line of the input (the first linux line) is printed because the conditional print statement comes after the statement that looks for the pattern and sets the flag.  If you want to start printing with the fourth line (the line after first linux line), just reverse the order of the two statements.

If no input line matches the regexp, the flag never gets set, and nothing is printed.

As I said, the name of the flag variable is arbitrary; you can use something shorter (e.g., f) if you want.  And { print } is the default action, so you can leave it out.  So, if you don't care about clarity, you can shorten the above to

echo "lost
load
linux
loan
linux" | awk '/^li/{f=1}f'
  • 1
    awk can also have two-part patterns where the action occurs for each group of lines starting with one that matches the left part until one that matches the right part, so awk '/linux/,0' prints lines starting with a match for linux and stopping only at EOF because 0 is false. – dave_thompson_085 Jan 24 '16 at 07:04
1

Two other awk solutions:

They both just set a found flag when seeing the first regex match and print when that flag is set.

echo "lost
load
linux
loan
linux" | awk 'BEGIN {found = 0} {if (found || $0 ~ /linux/) {found = 1; print}}'

This one is a little longer but doesn't set the found flag again.

echo "lost
load
linux
loan
linux" | awk 'BEGIN {found = 0} {if (found) {print} else if ($0 ~ /linux/) {found = 1; print}}'
dosentmatter
  • 508
  • 5
  • 12
1

You can use ex in batch mode to directly edit the file. (If you want to see what the output file would be before actually changing the file, replace the x by %p.)

printf '%s\n' 'a' 'linux' '.' '1,/linux/-1d' '$d' 'x' | ex -s file
  1. a, linux, . writes adds a linux line to the end.
  2. 1,/linux/-1d deletes the lines in the interval [first line of the file, line just before first linux];
  3. $d deletes the artificialy inserted line in step 1.
  4. x writes the changes and quits.

The more direct approach (see 1st version in edit history) would leave the file untouched if there were no match. This one empties the file, as required (that is the reason for the queer step 1).

$ cat file1
lost
load
linux
loan
linux
$ printf '%s\n' a linux . 1,/linux/-1d '$d' x | ex -s file1
$ cat file1
linux
loan
linux
$ cat file2
lost
load
loan
$ printf '%s\n' a linux . 1,/linux/-1d '$d' x | ex -s file2
$ cat file2  #file2 is empty
Quasímodo
  • 18,865
  • 4
  • 36
  • 73