Searching match of multi-line regex in files (without pcregrep)

Question

Question:

How could I find matches of a multi-line regular expression in files, without pcregrep?

I need to find/print the position of each occurrence.

Unfortunately, pcregrep is not present and I have no rights to install it. Other alternatives are grep perl sed python etc.

An example of regular expression to search is:

Text\nLine

Context:

A script provides hundreds MB of structured text in a few tens of files, but unfortunately some lines are missing (due to many reasons). I do need to check where those lines are missing, thus searching for the sequence of the previous and following lines.

Text
Missing //this line is sometimes missing.
Line

EDITED:

Possible input

example.txt

Text
Missing
Line

Text
Missing
Line

Text
Line

Text
Missing
Line

Possible output:

example.txt, line 10

Some of the tries with no success:

pcregrep 
    # command not found
apt-get install pcregrep 
    # no permission, no su credentials, distro don't provide pcregrep, outdated sources, customer does not want changes on the serve, etc.
sed -r 's#(Text\nLine)#\1#' ./* 
    # print all lines, not only matches, no indication of file or line, etc.
grep 'Text\nLine' ./* 
    # Does not works on multi-lines
sed -n '/Text/,/Line/{p}' ./* 
    # Not the same regex, does not indicate result lines, etc.

Are you sure your file is Unix format and is not using /n/r as line terminator — Kiwy, Jun 11 '18 at 07:44
Yes, I am sure. But in any case, it's not difficult to add Text\r?\nLine — Adrian Maire, Jun 11 '18 at 07:49
Possible duplicate of Multiline pattern match using sed, awk or grep — Kiwy, Jun 11 '18 at 07:54
@Kiwy This is not a duplicate other than that the solution that Adrian envisages happens to correspond with the title of the proposed dupe. The actual solution does not require a multiline regular expression. Also, the answers in the proposed dupe does not solve the given issue. — Kusalananda, Jun 11 '18 at 09:23

score 2 · Accepted Answer · edited Jun 11 '18 at 15:42

Unix tools are most often line-oriented, and there is therefore no way to apply a regular expression over several lines of input using the standard toolbox.

sed can be made to process the file in such a way that it's able to detect the lines you are looking for, but we do this strictly using operations on individual lines:

$ sed -n '/^Text/{N;/^Text\nLine/=;D;}' file
10

This sed script looks for the string Text at the start of a line. When found, it appends the next line to its buffer with a \n in-between.

If the buffer now matches ^Text\nLine then the current line number is outputted using the = command in sed. The line number outputted is that of the Line line in the file.

Note that while the second regular expression appears to match across a newline in the file, it does not. It matches across a newline in its internal buffer, which we put there using the N command when we read the next line from the file.

You would probably use this in a loop if you want to apply it to multiple files:

for name in pattern; do
    printf 'Processing %s...\n' "$name"
    sed -n '/^Text/{N;/^Text\nLine/=;D;}' "$name"
done

where pattern would be an ordinary filename globbing pattern that matches the files that you are interested in.

Seem the N has difficulties with multiple files, but I may iterate the directory with a bash loop. — Adrian Maire, Jun 11 '18 at 08:57
@AdrianMaire What difficulties? The fact that the line numbers are not reset between files? If you're using GNU sed, try sed -s -n .... But using a loop would be better as you would then be able to tell what lines refers to what files. — Kusalananda, Jun 11 '18 at 08:59
@Kiwy The answers to your proposed dupe are less than helpful in this instance. — Kusalananda, Jun 11 '18 at 09:19
@StéphaneChazelas Thanks! I forgot that the user supplied dummy text. — Kusalananda, Jun 11 '18 at 15:41

Stéphane Chazelas · Answer 2 · 2018-06-11T15:56:12.737

1

If vim is installed, you could use it in ex mode as:

vim -e -s -c 'argdo g/^Text\nLine/#' -c q ./*.txt

See also the z command to give context.

vim -e -s -c 'argdo g/^Text\nLine/z#.5' -c q ./*.txt

That doesn't print the file names though. A not very efficient perl approach could be:

perl -l -0777 -ne 'while (/Text\nLine/g) {
   print "$ARGV, line " . ++(() = $` =~ /\n/g)}' ./*.txt

edited Jun 11 '18 at 15:56

answered Jun 11 '18 at 09:49

Stéphane Chazelas

544,893

score 0 · Answer 3 · answered Jun 12 '18 at 00:01

0

 perl -ne 'eof and $. = 0 or /^Text/ && ($_ .= <>) =~ /^Line/m && print "$ARGV: $.\n"' ./*

This will print the file name alongwith the line number where the match occurred.

Also, the line counter ($.) is reset upon reaching eof of each file.

answered Jun 12 '18 at 00:01

Rakesh Sharma

839

Searching match of multi-line regex in files (without pcregrep)

Question:

Context:

EDITED:

3 Answers3

Linked