Filter a file with output of another command

Question

I have an external command, say check_this, which would spit out YES or NO for a file piped to it

cat myfile | check_this
YES
NO
YES
YES
...

Now I want to get all the lines in myfile with YES results. Is there a way to do this? Currently I use a tempfile, save it to another file, then use paste + grep, which is cumbersome and not robust.

it outputs yes or no for every line? So in your output we can assume line 1 is yes, line 2 is no, line 3 is yes, and line 4 is no? — jesse_b, Aug 06 '20 at 14:12

Stéphane Chazelas · Answer 1 · 2020-08-06T14:36:35.313

2

I'd use awk:

<myfile check_this | awk '
  !check_processed {if ($1 == "YES") yes[FNR]; next}
  FNR in yes' - check_processed=1 myfile

awk records which line numbers of check_this's output start with a YES word in the yes hash table, and then prints the lines of myfile whose number are in that yes hash table.

edited Aug 06 '20 at 14:36

answered Aug 06 '20 at 14:30

Stéphane Chazelas

544,893

Cbhihe · Answer 2 · 2020-08-08T15:51:57.290

0

A variant of @StéphaneChazelas' perfectly good awk-based solution, that is less compact but perhaps easier to read because it does not resort to an external variable (check_processed in his notation), would be:

$ awk 'FNR == NR {if ($1 == "YES") yes[FNR];next} 
       FNR != NR && FNR in yes'   <(check_this <myfile) myfile

Note: @RakeshSharma remarks that the simultaneous use of next (1st line) and of the test FNR != NR (2nd line) is a redundancy. Users of that pattern can remove one or the other with no change in output, as in:

$ awk 'FNR == NR {if ($1 == "YES") yes[FNR];next} 
       FNR in yes'   <(check_this <myfile) myfile

edited Aug 08 '20 at 15:51

answered Aug 07 '20 at 15:21

Cbhihe

2,701

The FNR != NR is redundant and can be removed. Or, remove the next from the previous line. – Rakesh Sharma Aug 08 '20 at 15:20
@RakeshSharma: You are 100% right. You could even see the simultaneous use of next and the test FNR != NR as an anti-pattern here. It was just meant as a quick illustration of awk's versatility, geared toward people not fully conversant with the idiom, or not comfortable with using external variables declared for the subshell on same cmd line (see StéphaneChazelas' answer)... I will nevertheless edit the answer with a small comment mentioning you for good measure. Good catch. – Cbhihe Aug 08 '20 at 15:46
Note that the FNR == NR approaches in general don't work properly when the first file is empty which is why I prefer the !flag/flag=1 approach. In this case though, it wouldn't be a problem as if myfile is empty, the output of check_this would also be empty. See Bypass a nawk snippet if the input file is empty – Stéphane Chazelas Aug 09 '20 at 08:55

score 0 · Answer 3 · answered Aug 08 '20 at 08:16

We can make use of the GNU version of the dc utility to basically implement a grep -f functionality.

dc -e "
$(< myfile check_this | sed -e 's/NO/0/;s/YES/1/' | tac)
[q]sq [p]sp [?z0=qr1=psxz0<?]s?
l?x
" < <(< myfile sed -e 's/.*/[&]/')

As a first step we load the check_this utility's output, booleanized appropriately (YES=>1, NO=>0), and pushed onto the stack. The next line from the input file is read and pushed on the stack. print it if the 2nd stack element is a 1.
Then we clear out the top 2 stack elements. Repeat until eof.

Rakesh Sharma · Answer 4 · 2020-08-08T15:47:19.527

GNU awk aka gawk+paste:

$ < myfile check_this \
   | paste myfile -      \
   | gawk '/YES$/ && NF--';

$ < myfile check_this \
    |  perl -lpe '
      @ARGV && do{
        /YES/ && $h{$.}++;
        eof && close(ARGV);
        next;
       };
        print if $h{$.};
  ' - myfile

GNU sed with extended regex mode ON:

$ < myfile check_this |
    sed -nE '
        1{:a;H;n;/^(YES|NO)$/ba;}
        G;/\n\nYES/P
        s/.*\n\n(YES|NO)/\n/;h
    ' - myfile

store the check_this output in hold and fir every line of myfile determine the leading value for hold is a yes. Then print the myfile line. Clip leading two elements from pattern space and re-store (NOT "restore" mind you) the pattern into hold space.

Filter a file with output of another command

4 Answers4