grep (or sed?): skip a specified number of lines before looking for matches

Question

I'm working with huge log files that accumulate over days that I can't truncate/rotate but need to parse new entries hourly.

I've been using grep to grab entries with a specific string then counting how many I get and tossing the first N, where N is the number of entries

I've already ingested on all prior loops, but of course this means inefficiently grepping the whole file every loop. I'm relatively unix naive, but I feel like there's a more efficient way to do this? I don't think tail would work because I won't know how many new lines have been written since the last parsing. This post talks of skipping, but using a search string to determine how many lines to skip whereas I'd be looking to supply the skip number as an argument. This one speaks to skipping a specified number of characters on each line, but I'd be looking to skip a specified number of lines.

Any suggestions?

score 4 · Answer 1 · answered Jun 14 '21 at 13:58

4

Figured it out while writing the Q, posting for posterity:

tail -n+N file | grep ...

where N is the number of lines to skip minus 1.

answered Jun 14 '21 at 13:58

Mike Lawrence

161

1

Or tail -f -n+M file | grep ... to carry on searching afterwards waiting for more lines to be added. – Stéphane Chazelas Jun 14 '21 at 14:17
Or sed '/start pattern/,$!d; /pattern/!d' to look for pattern starting with the first line that matches pattern. – Stéphane Chazelas Jun 14 '21 at 14:18
Or awk '$0 >= "2021-06-14 10:00" && /pattern/' if your logs are timestamped like that. – Stéphane Chazelas Jun 14 '21 at 14:19

score 1 · Answer 2 · answered Jun 14 '21 at 14:21

sed can be used to skip an initial number of lines. The command

sed '1,200d'

would delete the first 200 lines and pass all other lines on unchanged.

Likewise, awk could be used in a similar manner:

awk 'FNR > 200'

The above command would print line 201 and on but discard earlier lines. The FNR variable is the number of records (lines by default) read from the current file.

You could parametrize this easily to take a number from the command line:

awk -v n=200 'FNR > n'

You could also combine it with grep (replacing the function of grep with awk):

awk -v n=200 'FNR > n && /pattern/' somefile

... where pattern is some extended regular expression.

Or, to take the pattern from some value on the command line,

awk -v n=200 -v p='pattern' 'FNR > n && $0 ~ p'

or, safer, using an environment variable,

pattern='pattern' awk -v n=200 'FNR > n && $0 ~ ENVIRON["pattern"]' somefile

Any comment on the relative efficiency of tail -n+N file | grep … vs your see & awk options? — Mike Lawrence, Jun 14 '21 at 14:23
@MikeLawrence No. This depends on the implementation of the tools and the characteristics of the system that is used. What may be faster on one person's system, may be slower on some other person's system. Benchmarking is best made individually. — Kusalananda, Jun 14 '21 at 14:29

grep (or sed?): skip a specified number of lines before looking for matches

2 Answers2