grep skip n lines of file and only search after

Question

I have a huge log file and want to grep the first occurrence of a pattern, and then find another pattern right after this occurence.

For example:

123
XXY
214
ABC
182
558
ABC
856
ABC

In my example, I would like to find 182 and then find the next occurrence of ABC

The first occurrence is simple:

grep -n -m1 "182" /var/log/file

This outputs:

5:182

How do I find the next occurrence of ABC?

My idea was to tell grep to skip the first n lines (in the above example n=5), based on the line number of 182. But how do I do that?

Is it a requirement that grep is used? I don't think this can be done with grep but it would be easy with awk or sed (alone or in combination with grep). — Hauke Laging, Jan 10 '15 at 14:45
@HaukeLaging grep is not required. I am not so familiar yet with sed or awk. If you have a good solution, let me hear it! :) @don_crissti only the first line should be printed. I don't care about the other occurrences. — Kolja, Jan 10 '15 at 15:23

mikeserv · Accepted Answer · 2015-06-10T02:22:22.003

11

With sed you can use a range and quit input at a single completion:

sed '/^182$/p;//,/^ABC$/!d;/^ABC$/!d;q'

Similarly w/ GNU grep you can split the input between two greps:

{ grep -nxF -m1 182; grep -nxF -m1 ABC; } <<\IN
123
XXY
214
ABC
182
558
ABC
856
ABC
IN

... which prints...

5:182
2:ABC

... to signify that the first grep found a -Fixed-string literal, -xentire-line 182 match 5 lines from the start of its read, and the second found a similarly typed ABC match 2 lines from the start of its read - or 2 lines after the first grep quit reading at line 5.

From man grep:

-m NUM, --max-count=NUM
          Stop  reading  a  file  after  NUM  matching
          lines.   If the input is standard input from
          a regular file, and NUM matching  lines  are
          output, grep ensures that the standard input
          is  positioned  to  just  after   the   last
          matching  line before exiting, regardless of
          the  presence  of  trailing  context  lines.
          This  enables  a calling process to resume a
          search.

I used a here-document for the sake of reproducible demonstration, but you should probably do:

{ grep ...; grep ...; } </path/to/log.file

It will also work with other shell compound-command constructs like:

for p in 182 ABC; do grep -nxFm1 "$p"; done </path/to/log.file

edited Jun 10 '15 at 02:22

answered Jan 10 '15 at 22:14

mikeserv

58,310

+1 Saw that in the man page. That's what I tried, only with a pipe between the grep's instead of a ;... no-go – Xen2050 Jan 10 '15 at 22:29
@Xen2050 - the pipe won't work, usually - an lseekable file is usually what you want when sharing input. – mikeserv Jan 10 '15 at 22:32
Impressive answer but I don't support your statement about pipelines. The here document which the two greps share is effectively a pipeline for them. Something else: I tried without printing the marker line but sed '//,/^ABC$/!d;/^ABC$/!d;q' throws a strange error. What does // do? – Hauke Laging Jan 11 '15 at 00:02
1

@HaukeLaging - the here-document (in most shells) is not a pipe - it is a real tmp file created by the shell that the shell deletes before doing any writes - while maintaining the descriptor. It is still lseekable. Pipes, generally, are not lseekable. I'll look at the sed thing - just wrote it out real fast. – mikeserv Jan 11 '15 at 00:29
1

@HaukeLaging - Oh, so the sed thing works - you just left out the reference. In sed you can refer to the last /address/ again with an empty // address. So /^182$/command;//,/next_address/ just does /^182$/command;/^182$/,/next_address/. Your error was probably no previous regular expression if you were using a GNU sed. The pipe lseek thing, by the way, can be manipulated via indirection through the /dev/fd/[num] links on linux systems - but if you're not very careful to handle buffers well (like with dd) that's usually a losing battle. – mikeserv Jan 11 '15 at 00:40

jimmij · Answer 2 · 2015-01-10T14:55:58.297

2

Use grep with Perl-compatible regular expressions (pcregrep):

pcregrep -Mo '182(.|\n)*?\KABC'

Option -M allow pattern to match more than one line, and \K does not include matched pattern (up to this point) into the output. You can remove \K if you want to have the whole region as a result.

edited Jan 10 '15 at 14:55

answered Jan 10 '15 at 14:49

jimmij

47,140

Hauke Laging · Answer 3 · 2015-01-10T23:18:32.653

2

> awk '/^182$/ { startline=1; }; startline == 0 { next; }; /^ABC$/ { print "line " NR ": " $0; exit; }' file
line 7: ABC

edited Jan 10 '15 at 23:18

answered Jan 10 '15 at 19:51

Hauke Laging

90,279

1

That gives the first ABC anywhere; this question wants the first ABC after the first 182. Most direct is a flag like awk '/^182$/{z=1;next} z&&/^ABC$/{print NR":"$0;exit}' file -- or you can write at least one explicit getline() loop which is usually clumsier, or be clever(?) using a range almost like @JRFerguson's perl: awk '!x&&/^182$/,/^ABC$/ {x=NR":"$0} END{print x} – dave_thompson_085 Jan 10 '15 at 23:11
@dave_thompson_085 Indeed. Right idea but terribly coded (mixed up two ideas during writing). Embarraringly I even tried but didn't wonder at the output. – Hauke Laging Jan 10 '15 at 23:16

score 1 · Answer 4 · answered Jan 10 '15 at 15:36

A Perl variation you could use is:

perl -nle 'm/182/../ABC/ and print' file

...which prints lines in the matching range.

If you file contained more than one matching range, you can limit the output to only the first range by changing the / delimiter to ?

perl -nle 'm?182?..?ABC? and print'

score 1 · Answer 5 · answered Mar 13 '17 at 13:29

1

Another variant is this:

grep -n -A99999 "182" /var/log/file|grep -n -m1 "ABC"

The flag -An greps n lines after the match and 99999 is just to be sure we don't miss anything. Bigger files should have more lines (check with "wc -l").

answered Mar 13 '17 at 13:29

Fabbe

11

score 0 · Answer 6 · edited May 23 '17 at 12:40

Sticking with just grep and adding tail & cut, you could...

grep for the line number of the first match of 182:

grep -m 1 -n 182 /var/log/file |cut -f1 -d:

Use that to grep for all the ABC's only after the first matching line above, using tail's -n +K to output after the K'th line. All together:

tail -n +$(grep -m 1 -n 182 /var/log/file |cut -f1 -d:) /var/log/file | grep ABC

Or add -m 1 again to find only the first matching ABC

tail -n +$(grep -m 1 -n 182 /var/log/file|cut -f1 -d:) /var/log/file|grep -m 1 ABC

References:
man pages
https://stackoverflow.com/questions/6958841/use-grep-to-report-back-only-line-numbers

score 0 · Answer 7 · answered Mar 13 '17 at 14:25

The range operator , can be put to use here:

< yourfile \
sed -e '
   /182/,/ABC/!d
   //!d;=;/ABC/q
' | sed -e 'N;s/\n/:/'

The range operator .. in tandem with the match-only-once operator m?? can be put to use here in Perl

perl -lne 'm?182? .. m?ABC? and print "$.:$_" if /182/ || /ABC/' yourfile

grep skip n lines of file and only search after

7 Answers7

Linked