2

This might be a common/easy task but I couldn't figure out from examples on the web or manuals of awk/sed/grep.

So, here is the scenario:

  • There is an internal command line tool that prints out a multi-line result for each line in an input file.
  • I have an input file with 500K lines.
  • In the output of the tool, there is always a line which is like "src: /some/directory"
  • I want to extract this line if and only if there is the specific string "foo" in the same output.

The number of rows between these lines might be different, so this question is somewhat related but not exactly what I'm trying to do. Match multiple regular expressions from a single file using awk

How can I do this using awk, sed or grep? I can do it using Python but I don't want to because I want to learn awk/sed and this might be a good example.

Here is what I tried with grep:

tool -inputfile | if grep "foo"; then grep "src: " ; fi > result.txt

This doesn't produce the result I expected, probably because of something related to buffering.

Trying with awk:

tool -inputfile | awk '{for (i=1;i<NF;i++) {if(match($i, "foo")) print ??? }}' > result.txt

How do I print the line that contains "src: " in this script?

Example outputs of the tool:

Output 1:

src: /usr/bin 
param1: value1 value2 
param2: "foo" 
param3: "bar" "spam" 
param4: "eggs" "spam" "spam"

Output 2:

src: /dev/null
param1: value1 value2
param2: "ham" "spam" "eggs"

So for these 2 cases, I am trying to extract just the 1st one, ie: src: /usr/bin

2 Answers2

2

If you know that src: occurs at the beginning of a line and that foo is enclosed in quotes and preceded by a space and that there must be a colon earlier in the line, use

awk 'BEGIN{a=0} /^$/{if(a==1) print b; a=0} /:.* "foo"/{a=1} /^src:/{b=$0} END{if(a==1) print b}'

We use the variable a to remember whether or not the pattern foo occurs in the input block, and the variable b to store the src: line. At the beginning, a is set to 0. Whenever we find an empty line (i.e., ^$), we check the value of a, conditionally print b, and reset a. If we encounter "foo" preceded by a colon earlier in the line, we set a to 1. If we encounter src: at the beginning of a line (^), we store it in b. At the end, we check once more whether a == 1, if so, we print b.

Uwe
  • 3,297
  • 18
  • 19
  • Thanks, tried them on a small sample (of which I know the results). Awk script prints only the last matching case. And sed doesn't seem to be filtering properly, it prints all the cases. – Gani Simsek Oct 20 '14 at 15:42
  • Do you have several lines that contain src:? From your question, I had the impression that there is only one. – Uwe Oct 20 '14 at 15:45
  • For each line in the input, the tool produces a new output. In each output there is only one line that contain "src: ". For the outputs of the sample file, I'm 100% sure that "src: " occurs only once. For the real file with 500K lines, it's highly unlikely but I will check nevertheless. – Gani Simsek Oct 20 '14 at 15:52
  • And the outputs are separated by what? Blank lines? Or does every src: start a new block? – Uwe Oct 20 '14 at 15:54
  • ok, answer updated. – Uwe Oct 20 '14 at 15:58
  • a blank line and yes every block (except the 1st) starts with src: – Gani Simsek Oct 20 '14 at 15:59
  • Thanks, it works now. Could you please explain the logic and the parameters because I want to learn awk. – Gani Simsek Oct 20 '14 at 16:04
  • @Uwe Half of this code is redundant. Why have the begin block ? All variables start at 0. Also what is this /^$/{if(a==1) print b; a=0} supposed to do ? The /:.* "foo"/ doesn't need .* as it is checking it contains "foo" anyway. –  Oct 21 '14 at 08:10
  • @Jidder /^$/{if(a==1) print b; a=0} starts a new block. The input file for the awk script consists of several blocks separated by blank lines, and the src: line should be printed whenever the corresponding block contains foo. The /:.* "foo"/ pattern checks for "foo" somewhere after the colon in the line. The begin block is just for clarity, but yes, it's redundant. – Uwe Oct 21 '14 at 10:03
2

Easy awk

awk '/src/{a=$0}/foo/{b=1}b&&a{print a;exit}'

If src or foo can be somewhere else in a different format or whatever

awk '/^src/{a=$0}/"foo"/{b=1}b&&a{print a;exit}'

If foo always comes after src

awk '/^src/{a=$0}/"foo"/{print a;exit}'

If there are multiple src blocks in a file and you want to print each one that contains foo

awk '/^src/{a=$0;b=0}/"foo"/{b=1}b&&a{print a;a=0}'
  • Correct me I'm wrong: If src is found, set a to first row. If foo is found set b to 1. If both a and b is true, print a and then exit? – Gani Simsek Oct 21 '14 at 09:43
  • @GaniSimsek yep, well it sets a to the line that src is on, is that what you wanted ? –  Oct 21 '14 at 09:51
  • Yes but for multiple blocks, so your last script accomplishes the task the way I wanted. The reason I asked is to learn more about awk. Your logic and your code is concise and coherent. Thank you. – Gani Simsek Oct 21 '14 at 15:34