How to perform a multiline grep across multiple files?

Question

I'm trying to grab this pattern wherever it occurs in multiple log files (Note:these patterns can vary greatly in size, i.e. the amount of Blahs):

   Found an txt File
    Blah
    Blah
    10019874
    Blah
    Blah
    Processed File

Using this command line:

 pcregrep -M 'Found an.*(\n|.)*10019874.*(\n|.)*Processed' log_*.txt

My regex checks out REGEX HERE

I'm using pcregrep with the -M multiline flag. It will in any log files which begin 'log_' and end '.txt'. When I run this command it returns 'Segmentation Fault'

Is there a simpler/better way to do this?

The command you posted works as expected on my system. What pcregrep are you using? What OS? — terdon, Jun 19 '14 at 13:18

terdon · Accepted Answer · 2018-09-28T13:48:45.917

4

As I said in my comment, the command you posted works fine on my LMDE (pcregrep version 8.31 2012-07-06). However, since your regex only specifies part of the string you're looking for, you could also do this with normal grep:

grep -A 6 'Found an' log_*.txt | grep -C 3 10019874

The -A 6 will print the line matching the string passed and the 6 following lines and the -C 3 will print the 3 surrounding lines. The end result is exactly the same as the pcregrep approach you were using.

If your pattern can have differing numbers of lines, that can explain the segfault. Presumably, in some of your files, the matched section is too long and causes an out of memory error. One way around it would be a little scripting:

perl -ne '$c=1 if /Found an/; ## set $c to 1 if this line matches 'Found on'
          if($c){               ## If $c is defined and non-0
            push @F,$_;         ## Add the current line to the @F array
            $c++ if /10019874/; ## Increment $c if this line matches '10019874'
            if(/Processed/){    ## If this line matches 'Processed'
                print "@F" if $c>1; ## Print the contents of @F if $c is >1
                @F=""; $c=0;         ## Empty @F, set $c to 0.
            }
           }' log_*.txt

The same thing as a one liner:

perl -ne '$c=1 if /Found an/; if($c){push @F,$_; $c++ if /10019874/; if(/Processed/){print "@F" if $c>1; @F=""; $c=0;}}' log_*txt

edited Sep 28 '18 at 13:48

answered Jun 19 '14 at 13:22

terdon

242,166

Thanks, I should have mentioned these log files vary greatly in size. Hence why I'm looking for a more precise approach. I'll update my question to reflect this. – blarg Jun 19 '14 at 13:24
@blarg that shouldn't make any difference whatsoever. As long as the pattern is the same, this should work. – terdon Jun 19 '14 at 13:25
Ah yes, that's what I mean. The pattern can vary greatly in the amount of blahs between the target strings. – blarg Jun 19 '14 at 13:27
@blarg see updated answer. – terdon Jun 19 '14 at 13:41
The one liner is missing the closing semicolon after '@F=""'. This seems to return the whole file where the pattern occurs, rather than just the pattern. – blarg Jun 19 '14 at 13:52
@blarg the semicolon is not needed there since that's the last instruction before the closing } (but I added it for consistency). I had forgotten to set $c back to 0, it should work as expected now. – terdon Jun 19 '14 at 13:56

How to perform a multiline grep across multiple files?

1 Answers1

Linked