How can I delete everything until a pattern and everything after another pattern from a line?

Question

In the following file:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut eu metus id lectus vestibulum ultrices. Maecenas rhoncus.

I want to delete everything before consectetuer and everything after elit.

My desired output:

consectetuer adipiscing elit.

How can I do this?

The command can be sed. It can also be perl, or even pure bash. — muru, Nov 15 '15 at 22:01
@manuel If one of these answers solved your issue, please take a moment and accept it by clicking on the check mark to the left. That will mark the question as answered and is the way thanks are expressed on the Stack Exchange sites. — terdon, Nov 16 '15 at 21:50

MikeV · Accepted Answer · 2015-11-18T02:12:31.380

45

I'd use sed

sed 's/^.*\(consectetuer.*elit\).*$/\1/' file

Decoded the sed s/find/replace/ syntax:

s/^.* -- substitute starting at the beginning of the line (^) followed by anything (.*) up to...
\( - start a named block
consectetuer.*elit\. - match the first word, everything (.*) up to the last word (in this case, including the trailing (escaped)dot) you want to match
\) - end the named block
match everything else (.*) to the end of the line ($)
/ - end the substitute find section
\1 - replace with the name block between the $ and the $ above
/ - end the replace

edited Nov 18 '15 at 02:12

answered Nov 15 '15 at 22:03

MikeV

1,390
12
16

1

Good answer, but you don't need the ^ or $ since sed will try and find the longest match. Also you may have missed the dot after elit, you could insert \. if necessary. – asoundmove Nov 17 '15 at 00:31
2

@asoundmove Good catch on the trailing dot on "elit." -- you have quite a sharp eye!. I've updated my answer to include the the escaped dot in the pattern. Your also correct that the ^ and $ aren't necessary -- I left them there for as the questioner noted (originally) that he was a bit of a beginner and this may be helpful in other contexts. – MikeV Nov 18 '15 at 02:17
I've always copy-pasted sed solutions and hacked them to fit my needs but thanks to this answer I feel like I actually understand it now. Great answer – Tyler Feb 04 '20 at 20:39
1

It seems not working for multiline files. If my "elit" word is on another line this seems not working well. – рüффп Oct 06 '20 at 14:30

score 11 · Answer 2 · edited Nov 16 '15 at 21:51

If every line contains both start and end pattern then the easiest way to do this is with grep. Instead of deleting the beginning and ending of each line you can simply output the contents between both patterns. The -o option in GNU grep outputs only the matches:

grep -o 'consectetuer.*elit' file

Note: as mentioned, this only works if every line in the file can be parsed this way. Then again, that's 80% of all typical use-cases.

林果皞 · Answer 3 · 2021-12-21T07:57:44.703

I'm not sure why this question title has been edited "from file" to "from a line" while the OP doesn't exclude the possibility across multiple lines even though the example seems to be one line only. Whatever, it might helpful to provide multiple lines solution here.

This works for cross-lines (This answer works if from1 and to2 exist in the file.):

from1=consectetuer; to2=elit; a="$(cat file)"; a="$(echo "${a#*"$from1"}")"; echo "$from1${a%%"$to2"*}$to2"

Examples:

[xiaobai@xiaobai tmp]$ cat file
1
abc consectetuer lsl
home
def elit dd
2 consectetuer ABC elit
[xiaobai@xiaobai tmp]$ from1=consectetuer; to2=elit; a="$(cat file)"; a="$(echo "${a#"$from1"}")"; echo "$from1${a%%"$to2"}$to2"
consectetuer lsl
home
def elit
[xiaobai@xiaobai tmp]$

reference: Shell Parameter Expansion

score 1 · Answer 4 · answered Nov 15 '15 at 22:17

Two for loops in AWK:

$ awk '{for(i=1;i<=NF;i++) {if ($i == "consectetuer") beginning=i; if($i== "elit.") ending=i }; for (j=beginning;j<=ending;j++) printf $j" ";printf "\n"   }' file.txt 
consectetuer adipiscing elit.

AWK's gsub:

$ awk '{gsub(/^.*consectetuer/,"consectetuer"); gsub(/elit.*$/,"elit.");print}' file.txt
consectetuer adipiscing elit.

score 1 · Answer 5 · edited Apr 13 '17 at 12:37

A Perl way. This is essentially the same as MikeV's sed answer:

perl -pe 's/.*(consectetuer.*elit).*./$1/' file

The -p means "print every line after applying the script given with -e". The s/foo/bar/ is the substitution operator; it will replace foo with bar. The parentheses capture a pattern and let us use it in the replacement. The first captured pattern is $1, the second $2 and so on.

So, the command will match everything up to consectetuer (.*consectetuer), then everything until elit (.*elit) and then everything else until the end of the line (.*) and will replace that with the captured pattern.

How can I delete everything until a pattern and everything after another pattern from a line?

5 Answers5

Linked