GREP / SED or AWK: Print entire paragraph in a file on pattern match

Question

I have a file with hundreds of paragraphs of around 15 lines each. I need to search for a pattern, say Occurrence: 1. If this pattern is found in the para, I need to print the entire paragraph. Note that the paragraphs are separated by 2 new line characters.

I have tried the below line of code and this obviously prints the first occurrence in the file. I am somehow unable to use a loop and print all such occurrences.

sed -n '1,/Occurrence: 1/p' ystdef.txt | tail -9 > ystalarm.txt

Can I use the g (global) flag with sed to make this work? If yes, how?

Note that I am aware of the grep -A/B/C commands but they wont work on my Cygwin terminal.

awk -vRS= -vORS='\n\n' '/pattern/' – Stéphane Chazelas Jun 09 '14 at 13:04 — Stéphane Chazelas, Jun 09 '14 at 13:04
I get an Unknown option "-RS=" message – Irfan N Jun 09 '14 at 13:34 — Irfan N, Jun 09 '14 at 13:34
That's -vRS, or -v RS, not -RS – Stéphane Chazelas Jun 09 '14 at 13:39 — Stéphane Chazelas, Jun 09 '14 at 13:39

score 12 · Answer 1 · edited Oct 10 '23 at 19:25

12

You can use awk's “paragraph mode”, where input records are delimited by sequences of at least two newlines. This is activated by setting RS to an empty string.

awk -v RS= '/Occurance: 1/' ystdef.txt

Note that the paragraphs will be printed all collapsed together (with a single newline between their content). Awk doesn't let you match the output separator with the input separator (except with some GNU awk extensions), but you can easily standardize the paragraph separator to two newlines.

awk -v RS= -v ORS='\n\n' '/Occurance: 1/' ystdef.txt

If you don't want an extra newline at the end:

awk -v RS= '/Occurance: 1/ {if (not_first) print ""; print; not_first=1}' ystdef.txt

edited Oct 10 '23 at 19:25

user8545642

3

answered Jun 10 '14 at 00:37

Gilles 'SO- stop being evil'

829,060

Nice! RS= is a magic value, and POSIX defined: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html "If RS is null, then records are separated by sequences consisting of a plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a shall always be a field separator, no matter what the value of FS is." – Ciro Santilli OurBigBook.com Jul 07 '15 at 08:10

score 8 · Answer 2 · edited May 10 '18 at 10:50

Here it is in GNU sed:

sed '/./{H;$!d};x;/SEARCH/!d'

Portable/POSIX syntax:

sed -e '/./{H;$!d;}' -e 'x;/SEARCH/!d'

If a line contains one or more characters it is appended to Hold space and if it is ! the $last line it is deleted. This means every line that is not a blank gets stored and removed from output.

So if a line is not deleted then sed exchanges the contents of hold and pattern space. This makes the hold space only a blank line and the pattern space all lines since the last blank line.

sed then addresses the pattern /SEARCH/. If !not found it deletes the pattern space without printing, else the paragraph is printed by default.

Here it is in a shell function with your question as input :

Note - the processed data is commented below for readability's sake in the face of this site's code highlighting. It will work as is or without the hashes.

_pgraph() { 
    sed '/./{H;$!d};x;/'"$1"'/!d'
} <<\DATA
#    I have a file with hundreds of paragraphs of
#    around 15 lines each. I need to search for a
#    pattern, say Occurance: 1. If this pattern is
#    found in the para, I need to print the entire
#    paragraph. Note that the paragraps are seperared
#    by 2 new line characters.

#    I have tried the below line of code and this
#    obviously prints the first occurence in the
#    file. I am somehow unable to use a loop and
#    print all such occurances.

#    sed -n '1,/Occurance: 1/p' ystdef.txt | tail -9 >
#    ystalarm.txt Can I use the g (global) flag with
#    sed to make this work? If yes, how?

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.
DATA

Now I can do:

_pgraph Note

###OUTPUT

#    I have a file with hundreds of paragraphs of
#    around 15 lines each. I need to search for a
#    pattern, say Occurance: 1. If this pattern is
#    found in the para, I need to print the entire
#    paragraph. Note that the paragraps are seperared
#    by 2 new line characters.

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.

Or more specifically:

_pgraph 'Note that I'

#    Note that I am aware of the grep -A/B/C commands
#    but they wont work on my cygwin terminal.

You can do the same for any file without appending a literal input to the function itself by simply removing everything from <<\DATA to DATA in the function definition and running it like:

_pgraph 'PATTERN' </path/to/input.file

score 4 · Answer 3 · answered Jun 09 '14 at 13:17

4

You can use the "paragraph mode" in Perl:

perl -ne 'BEGIN{ $/ = "" } print if /pattern/' input

answered Jun 09 '14 at 13:17

choroba

47,233

4

+1 Or, slightly easier on the eyes, perl -n00e 'print if /pattern/' input – Joseph R. Jun 09 '14 at 13:23
`F:\apps\autosys_tools\BarCapAutoSysCLI_v3.1\bin>perl -ne 'BEGIN{ $/ = "" } print if /alarm_if_fail/' ystdef.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.`
– Irfan N Jun 09 '14 at 13:26
@IrfanN: On MSWin, you have to switch quotes perl -ne "BEGIN{ $/ = '' } print if /pattern/". – choroba Jun 09 '14 at 13:53

GREP / SED or AWK: Print entire paragraph in a file on pattern match

3 Answers3

Linked

Related