Retrieving lines from a file depending on other lines

Question

Imagine the following file structure:

foo.bar.1
blabla
moreblabla
relevant=yes
foo.bar.2
relevant=no
foo.bar.3
blablabla
foo.bar.4
relevant=yes

I want to retrieve all foo.bar lines where within the block following themselves and before the next foo.bar there is a line stating relevant=yes.

So the output should be:

foo.bar.1
foo.bar.4

I could of course write a program/script iterating through the lines, remembering the foo.bars and print them when there is a line saying relevant=yes following them an before the next foo.bar. But I thought there might be an out-of-the box way using standard Unix utilities (grep/sed/awk)?

Thanx for any hints!

Chose awk solution as "answer" for being the clearest one in syntax. — hal9000, Feb 03 '16 at 20:07

score 5 · Accepted Answer · answered Jan 31 '16 at 23:41

If the input is processed line by line, then processing needs to go like this:

if the current line is foo.bar, store it, forgetting any previous foo.bar line that wasn't enabled for output;
if the current line is relevant=yes, this enables the latest foo.bar for output.

This kind of reasoning is a job for awk. (It can also be done in sed if you like pain.)

awk '
    /^foo\.bar/ { foobar = $0 }
    /^relevant=yes$/ {if (foobar != "") {print foobar; foobar = ""}}
'

Thanks for giving this awk solution. Pretty straight-forward. — hal9000, Feb 02 '16 at 19:55

score 3 · Answer 2 · answered Jan 31 '16 at 23:48

3

Here's one way with sed:

sed '/foo\.bar/h;/relevant=yes/!d;x;/foo\.bar/!d' infile

Lines matching foobar are copied to hold space. All lines, except those matching relevant=yes are deleted. Exchange hold space with pattern space (this only happens when lines match relevant=yes) and delete if it doesn't match foobar.

answered Jan 31 '16 at 23:48

don_crissti

82,805

I get an bash: !d': event not found, what am I doing wrong? – user unknown Feb 01 '16 at 00:24
1

@userunknown - you're using bash... try with ! d that is, insert a space after each ! or try set +H before running the sed command – don_crissti Feb 01 '16 at 09:46
Thanks for giving this sed solution as well. Looks a little more painful than the awk-one imho as Gilles pointed out above. ;-) But I like it just as well. – hal9000 Feb 03 '16 at 18:54
@hal9000 - You're welcome ! I don't quite understand what's so painful about it but let's leave it at that. – don_crissti Feb 03 '16 at 19:00
@don_crissti - Sorry, sounded a little unfair. I admit it not being painful. It is just a very very little less straight-forward than awk imho. Was more catching the term Gilles used. Explanation: For someone not knowing sed perfectly well (count me in), the concept of spaces and exchanging them doesn't come clear just by looking at the syntax. The awk solution might be easier to grasp if you do not know either sed nor awk. But again, I loved to see a sed solution to this. This is why I pointed sed out as a tag. – hal9000 Feb 03 '16 at 20:17

score 1 · Answer 3 · answered Feb 01 '16 at 08:25

1

Pythonic way:

>>> with open("/home/xieerqi/textfile.txt") as file:
...   for line in file:
...       if line.__contains__("foo"):
...          VAR = line
...       if line.__contains__("relevant=yes"):
...          print VAR
... 
foo.bar.1

foo.bar.4

Put together in a script:

DIR:/xieerqi
skolodya@ubuntu:$ chmod +x  relevance.py                                       

DIR:/xieerqi
skolodya@ubuntu:$ ./relevance.py textfile.txt                                  
foo.bar.1
foo.bar.4

DIR:/xieerqi
skolodya@ubuntu:$ cat relevance.py                                             
#!/usr/bin/env python
import sys

with open(sys.argv[1]) as file:
   for line in file:
       if line.__contains__("foo"):
          VAR = line.strip("\n")
       if line.__contains__("relevant=yes"):
          print VAR

answered Feb 01 '16 at 08:25

Sergiy Kolodyazhnyy

16,527

Thank you for the python solution though I prefer the other two solutions in this case. They are shorter and are what I meant when writing "out-of-the-box" in my initial question. But in the end your solution is just as fine. One thing though is taht when the pattern is a regular expression. awk and sed can handle them immediately whereas Python needs to be modified a little. (import re) – hal9000 Feb 03 '16 at 20:04
:) Fair comment - awk and sed are always better for text processing, and I do prefer them myself ( in fact awk is a big part of my reputation on the other site, askubuntu.com ), however both sed and awk solutions have been posted, so I thought I'd bright something else to the table. Hope in the end my answer was beneficial in some way to you and other users ! – Sergiy Kolodyazhnyy Feb 03 '16 at 20:20

Retrieving lines from a file depending on other lines

3 Answers3