Print a line in stdout that matches an expression if the output contains another expression

Question

This might be a common/easy task but I couldn't figure out from examples on the web or manuals of awk/sed/grep.

So, here is the scenario:

There is an internal command line tool that prints out a multi-line result for each line in an input file.
I have an input file with 500K lines.
In the output of the tool, there is always a line which is like "src: /some/directory"
I want to extract this line if and only if there is the specific string "foo" in the same output.

The number of rows between these lines might be different, so this question is somewhat related but not exactly what I'm trying to do. Match multiple regular expressions from a single file using awk

How can I do this using awk, sed or grep? I can do it using Python but I don't want to because I want to learn awk/sed and this might be a good example.

Here is what I tried with grep:

tool -inputfile | if grep "foo"; then grep "src: " ; fi > result.txt

This doesn't produce the result I expected, probably because of something related to buffering.

Trying with awk:

tool -inputfile | awk '{for (i=1;i<NF;i++) {if(match($i, "foo")) print ??? }}' > result.txt

How do I print the line that contains "src: " in this script?

Example outputs of the tool:

Output 1:

src: /usr/bin 
param1: value1 value2 
param2: "foo" 
param3: "bar" "spam" 
param4: "eggs" "spam" "spam"

Output 2:

src: /dev/null
param1: value1 value2
param2: "ham" "spam" "eggs"

So for these 2 cases, I am trying to extract just the 1st one, ie: src: /usr/bin

How are the records separated in the output of the internal tool? Can you provide a small example, please? — jofel, Oct 20 '14 at 15:06
line by line, each line is like "parameter: value, value, value, ..." the values can be one or more — Gani Simsek, Oct 20 '14 at 15:15
but is there anything delimiting each multiple-line record (such as a blank line between them)? — steeldriver, Oct 20 '14 at 15:48

Uwe · Accepted Answer · 2014-10-20T16:11:14.767

2

If you know that src: occurs at the beginning of a line and that foo is enclosed in quotes and preceded by a space and that there must be a colon earlier in the line, use

awk 'BEGIN{a=0} /^$/{if(a==1) print b; a=0} /:.* "foo"/{a=1} /^src:/{b=$0} END{if(a==1) print b}'

We use the variable a to remember whether or not the pattern foo occurs in the input block, and the variable b to store the src: line. At the beginning, a is set to 0. Whenever we find an empty line (i.e., ^$), we check the value of a, conditionally print b, and reset a. If we encounter "foo" preceded by a colon earlier in the line, we set a to 1. If we encounter src: at the beginning of a line (^), we store it in b. At the end, we check once more whether a == 1, if so, we print b.

edited Oct 20 '14 at 16:11

answered Oct 20 '14 at 15:12

Uwe

3,297
18
19

Thanks, tried them on a small sample (of which I know the results). Awk script prints only the last matching case. And sed doesn't seem to be filtering properly, it prints all the cases. – Gani Simsek Oct 20 '14 at 15:42
Do you have several lines that contain src:? From your question, I had the impression that there is only one. – Uwe Oct 20 '14 at 15:45
For each line in the input, the tool produces a new output. In each output there is only one line that contain "src: ". For the outputs of the sample file, I'm 100% sure that "src: " occurs only once. For the real file with 500K lines, it's highly unlikely but I will check nevertheless. – Gani Simsek Oct 20 '14 at 15:52
And the outputs are separated by what? Blank lines? Or does every src: start a new block? – Uwe Oct 20 '14 at 15:54
ok, answer updated. – Uwe Oct 20 '14 at 15:58
a blank line and yes every block (except the 1st) starts with src: – Gani Simsek Oct 20 '14 at 15:59
Thanks, it works now. Could you please explain the logic and the parameters because I want to learn awk. – Gani Simsek Oct 20 '14 at 16:04
@Uwe Half of this code is redundant. Why have the begin block ? All variables start at 0. Also what is this /^$/{if(a==1) print b; a=0} supposed to do ? The /:.* "foo"/ doesn't need .* as it is checking it contains "foo" anyway. – Oct 21 '14 at 08:10
@Jidder /^$/{if(a==1) print b; a=0} starts a new block. The input file for the awk script consists of several blocks separated by blank lines, and the src: line should be printed whenever the corresponding block contains foo. The /:.* "foo"/ pattern checks for "foo" somewhere after the colon in the line. The begin block is just for clarity, but yes, it's redundant. – Uwe Oct 21 '14 at 10:03

score 2 · Answer 2 · 2014-10-21T10:08:22.810

2

Easy awk

awk '/src/{a=$0}/foo/{b=1}b&&a{print a;exit}'

If src or foo can be somewhere else in a different format or whatever

awk '/^src/{a=$0}/"foo"/{b=1}b&&a{print a;exit}'

If foo always comes after src

awk '/^src/{a=$0}/"foo"/{print a;exit}'

If there are multiple src blocks in a file and you want to print each one that contains foo

awk '/^src/{a=$0;b=0}/"foo"/{b=1}b&&a{print a;a=0}'

edited Oct 21 '14 at 10:08

answered Oct 21 '14 at 08:02

Correct me I'm wrong: If src is found, set a to first row. If foo is found set b to 1. If both a and b is true, print a and then exit? – Gani Simsek Oct 21 '14 at 09:43
@GaniSimsek yep, well it sets a to the line that src is on, is that what you wanted ? – Oct 21 '14 at 09:51
Yes but for multiple blocks, so your last script accomplishes the task the way I wanted. The reason I asked is to learn more about awk. Your logic and your code is concise and coherent. Thank you. – Gani Simsek Oct 21 '14 at 15:34

Print a line in stdout that matches an expression if the output contains another expression

2 Answers2