3

To investigate within logs, I am trying to find the very first time a vulnerability in a workflow has been exploited.

The pattern is on multiple lines.

The pattern would be

AAAAAAAAA
BBBBBBBBB
CCCCCCCCC

The problem is that

AAAAAAAAA

or

BBBBBBBBB

or

CCCCCCCCC

Can be found anywhere indivdually in the log without showing the vulnerability; it is the exact pattern in this exact order that will help me.

For example

grep -Ei "AAAAAAAAA|BBBBBBBBB|CCCCCCCCC" logfile does not help me since all the lines with individual occurence of AAAAAAAAA BBBBBBBBB CCCCCCCCC will be there.

How can I solve this?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Foopz
  • 31

4 Answers4

1

Here's a way you can do it in python (I added to your example a bit to prove that you can still get the matches you desire even if there are random single lines of AAAAAAAAA, BBBBBBBBB, or CCCCCCCCC dispersed throughout the logfile) :

below are the contents of find_log_vulns.py

#! /usr/bin/python3

import re

test_string = """1234324 AAAAAAAAA BBBBBBBBB CCCCCCCCC absdfjv4er4 AAAAAAAAA BBBBBBBBB CCCCCCCCC 123466666 AAAAAAAAA ghrhvhhhfh BBBBBBBBB fjwjefjsjfjwjf CCCCCCCCC 24wfsgggg AAAAAAAAA BBBBBBBBB CCCCCCCCC zzzz"""

matches = re.findall('AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', test_string, re.MULTILINE)

print(matches)

The result I get from running the above:

$ ./find_log_vulns.py
['AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', 'AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n', 'AAAAAAAAA\nBBBBBBBBB\nCCCCCCCCC\n']

As shown above, each match will be returned as an element in a list.

EWJ00
  • 381
1

using ripgrep:

rg -U 'A+\nB+\nC+' in
2:AAAAAAAAA
3:BBBBBBBBB
4:CCCCCCCCC
6:AAAAAAAAA
7:BBBBBBBBB
8:CCCCCCCCC
16:AAAAAAAAA
17:BBBBBBBBB
18:CCCCCCCCC

you can get rid of the line numbers, and so on. If you need separators between the matches you can do this:

rg -U 'A+\nB+\nC+' in | rg --passthru -e '(^A)' -r $'\n'A

AAAAAAAAA BBBBBBBBB CCCCCCCCC

AAAAAAAAA BBBBBBBBB CCCCCCCCC

AAAAAAAAA BBBBBBBBB CCCCCCCCC

1

Using awk:

awk -v ptrn="AAAAAAAAA\0BBBBBBBBB\0CCCCCCCCC\0" '
BEGIN{ split(ptrn, tmp, "\0"); lngth=gsub("\0", "", ptrn ) }
$0 ~ tmp[++fieldNr]{ buf=(buf==""?"": buf OFS) NR":"$0 ;
                     if ( fieldNr == lngth ) { print buf; exit }
                     next
                   }
{ fieldNr=0; buf="" }' infile

this will give you the line number followed by the matched line content; here we used "Partial Regexp Match" using the patterns from the "ptrn" against the lines. see How do I find the text that matches a pattern? for other matching options.

we used NUL character \0 to separate patterns.


Sample input:

AAAAAAAAA
BBBBBBBBB

CCCCCCCCC AAAAAAAAA BBBBBBBBB ccccccccc 123AAAAAAAAA BBBBBBBBB123 123CCCCCCCCC3

Output:

8:123AAAAAAAAA 9:BBBBBBBBB123 10:123CCCCCCCCC3
αғsнιη
  • 41,407
1

Just for fun with good old awk

cat file | wc -l
21287021

with > 3000,000 matches

time awk 'BEGIN{getline; a=$0; getline; b=$0}
       $0~/^C+$/ && a~/^A+$/ && b~/^B+$/{print "match starting on line "NR-2 }{a=b;b=$0}' file

real 0m12.644s user 0m7.149s sys 0m4.314s

Compared with rgon my machine

time rg -U 'A+\nB+\nC+' file
real    0m40.322s
user    0m16.503s
sys     0m17.246s
bu5hman
  • 4,756