61

I have a file which is having following content:

zdk
aaa
b12
cdn
dke
kdn

Input1: aaa and cdn

Output 1:

aaa
b12
cdn

Input 2: zdk and dke

Output 2:

zdk
aaa
b12
cdn
dke

I could use below commands to achieve:

grep -a aaa -A2 file # Output 1
grep -a aaa -A4 file # Output 2

But in the file I don't know what is the exact occurrence (position) of end string pattern (file is having 20000 rows)

How can I achieve this?

don_crissti
  • 82,805
RBB
  • 1,009
  • I think the other question is too specific to be duplicate of this. Most of the answer from here won't work there, as there are different requirements (by specifying extra new line characters). – kenorb Oct 18 '15 at 11:45
  • I think here the start and end patterns could be anywhere in the file (no necessary to do with new lines) despite given the specific example, the other asks for the whole lines. – kenorb Oct 18 '15 at 12:11
  • @kenorb it's still the same idea. The main trick is to use one of the tools that can do /foo/,/bar/ to define patterns. The specifics of the pattern (being at the end for example) are secondary. – terdon Oct 18 '15 at 15:51
  • This answer might also be applicable: https://stackoverflow.com/a/48022994/2026975 – imriss Dec 29 '17 at 16:54

3 Answers3

82

grep won't help you here. This is a job better accomplished with sed using range expressions:

$ sed -n '/aaa/,/cdn/p' file
aaa
b12
cdn
$ sed -n '/zdk/,/dke/p' file
zdk
aaa
b12
cdn
dke

sed -n suppresses the automatic printing, so that lines are printed just if explicitly asked to. And this happens when the range /aaa/,/cdn/ happens.

These range expressions are also available in awk, where you can say:

awk '/zdk/,/dke/' file

Of course, all these conditions can be expanded to a more strict regex like sed -n '/^aaa$/,/^cdn$/p' file to check that the lines consist on exactly aaa and cdn, nothing else.

fedorqui
  • 7,861
  • 7
  • 36
  • 74
  • 1
    Wow, I had this same problem and didn't ask. I used 2 instances of grep -n and compared the number for my solution. – Kip K Oct 16 '15 at 22:09
  • 2
    @KipK range expressions are such a useful tool for these kind of problems : ) You can also use awk and some flags, but the basic case is quite straight forward with this. – fedorqui Oct 16 '15 at 22:11
  • @Fedorqui, i getting some problem if the search pattern contains xmls. like, I need to search starting with <ns0:abcd xmlns=""> , getting the resull. But it is showing also. I have tried sed -n '/<ns0:abcd xmlns="">/,/<ns0:abcd>/p file – RBB Oct 17 '15 at 12:31
  • @Spike maybe you should edit your question to show this sample file. To me, sed -n '/^<ns0:abcd xmlns="">$/,/dke/p' file works fine. Note I added ^ and $ to match the string exactly. Note you can also check my answer in How to select lines between two marker patterns which may occur multiple times with awk/sed. – fedorqui Oct 17 '15 at 23:10
  • is it shortest match ? – Mukul Anand Oct 09 '19 at 09:03
  • 1
    @MukulAnand please provide an example so we can cross check – fedorqui Oct 09 '19 at 11:06
  • 2
    Any way how to not print the start and end pattern themselves (except for using head and tail)? – Marki Dec 30 '19 at 08:21
  • @Marki see https://stackoverflow.com/q/38972736/1983854 – fedorqui Dec 30 '19 at 09:45
  • Unfortunately it doesn't seem easy to use non-greedy matching of the end range? – olejorgenb Oct 12 '20 at 09:25
  • I need to get something like this in AWK: cat ~/.aws/config | sed -n '/'"$AWS_PROFILE"'/,/^\[/p'. So it should be between the starting pattern and next unknown word that is between the square brackets. AWK does not print it as expected cat ~/.aws/config | awk '/\['"$AWS_PROFILE"'\]/,/^\[/' – t7e Jun 05 '22 at 00:11
4

It can be done by sed

sed -n '
    /^aaa$/,/^cdn$/w output1
    /^zdk$/,/^dke$/w output2
    ' file
Costas
  • 14,916
3

Here is grep command:

grep -o "aaa.*cdn" <(paste -sd_ file) | tr '_' '\n'

You can achieve multiline match in grep, but you need to use perl-regexp for grep (-P - which is not supported on every platform, like OS X), so as workaround we're replacing new lines with _ character and after grep, we're changing them back.

Alternatively you can use pcregrep which supports multi-line patterns (-M).

Or use ex:

ex +"/aaa/,/cdn/p" -scq! file
kenorb
  • 20,988