3

There is a big file containing a pattern which is repeated periodically in the file, I want to extract just a specific pattern after certain values of occurrence as well as the next N lines.
Here is an example but the numbers before members of the group are not really existing.

input:

1 members of the group
...
...
2 members of the group
...
...
...
n members of the group
...
...
...

output:

85 members of the group
...
...
...
...
...

(85th match and the next 5 lines)

don_crissti
  • 82,805

4 Answers4

6

Here's one way with awk:

awk -vN=85 -vM=5 'BEGIN{c=0}
/PATTERN/{c++
{if (c==N) {l=NR;last=NR+M}}
}{if (NR<=last && NR>=l) print}' infile

Where N is the Nth line matching PATTERN and M is the number of lines that follow. It sets a counter and when the Nth line matching is encountered it saves the line number. It then prints the lines from the current NR up to NR+M.


For the record, that's how you do it with sed (gnu sed syntax):

sed -nE '/PATTERN/{x;/\n{84}/{x;$!N;$!N;$!N;$!N;$!N;p;q};s/.*/&\n/;x}' infile

This is using the hold space to count.
Each time it encounters a line matching PATTERN it exchanges buffers and checks if there are N-1 occurrences of \newline character in the hold buffer. If the check is successful it exchanges again, pulls in the next M lines with the $!N command and prints the pattern space then quits.
Otherwise it just adds another \newline char to the hold space and exchanges back.
This solution is less convenient as it quickly becomes cumbersome when M is a big number and requires some printf-fu to build up a sed script (not to mention the pattern and hold space limits with some seds).

don_crissti
  • 82,805
2

Not knowing awk and using sed mostly for regex stuff, here is how I would do it:

  • use grep to find the pattern, include line numers (-n)
  • use head and tail (or sed) to get the 85th match (see here)
  • isolate out the line number N using cut
  • again, use head and tail (or sed) to get the Nth line of the original file and subsequent five lines

All this can be combined into one line. Dirty, probably slow, but will work with a minimal toolset.

Example

The following searches the rkhunter.log file and shows the third match of "basename" and subsequent four lines:

 /var/log$ tail rkhunter.log -n +$(grep -n 'basename' rkhunter.log|cut -d: -f1|tail -n +3|head -1)| head -5

Edit

Just saw @Wildcard's answer and the -m switch of grep is really much easier to use than my original solution. So here's another answer using grep -m

/var/log$ grep -m 3 -A 4 'basename'  rkhunter.log | tail -5
2
(exec <file.txt; grep -m 85 'PATTERN' | tail -n 1; head -n 5)

Obviously you can adjust the numbers as desired.

From man grep:

   -m NUM, --max-count=NUM
          Stop reading a file after NUM matching lines.  If the  input  is
          standard  input  from a regular file, and NUM matching lines are
          output, grep ensures that the standard input  is  positioned  to
          just  after the last matching line before exiting, regardless of
          the presence of trailing context lines.  This enables a  calling
          process  to resume a search.

The above command takes advantage of this feature by using a subshell and setting the STDIN to the file that you intend to grep, so that this feature can work correctly. Then you can simply catch the final (85th) instance with tail -n 1, and get the context lines you want with a separate call to head.

Use this command if you know that the file has at least 85 instances of PATTERN; in that case it will work perfectly.

If it may have less, the command will require some adjustment; in its current state it will simply print the final match with no trailing context lines if there are fewer matches than you've requested.

Wildcard
  • 36,499
0

That works in my bash:

{ T=85; N=5; c=0; while read line ; do echo "$line" | grep -c "members of the group" > /dev/null && c=$(($c+1)) ; [[ $c -eq $T ]] && { echo "$line"; break ;} ; done ; head -n $N ; } < input_file