2

Setup: Linux GNU bash, version 4.3

if grep -c PATTERN $sourcefile
then
     grep PATTERN $sourcefile | gzip  > compressedfile.gz
fi

I want to prevent having to access the sourcefile twice.

How can i achieve this?

  • 1
    Try to assign the grep result to a variable and check exit code. – mja Apr 26 '18 at 10:01
  • Use grep -q instead of grep -c, it will exit with 0 on 1st match so you won't process the file twice (unless you only have one match which is on the last line). Related: Check if pipe is empty and run a command on the data if it isn't – don_crissti Apr 26 '18 at 10:18
  • @don_crissti thank you for the link, very informative. – Dennis Nolte Apr 26 '18 at 11:17
  • Whatever... the point was that unless you have a good reason to avoid accessing the input file twice, your optimized (per my advice) code is fine and depending on the input and hardware it may run faster than the code in the accepted answer. – don_crissti Apr 26 '18 at 12:31
  • @don_crissti you are right in the case where the match is early on, and i agree i was too vague to have the best optimized (performance) answer. However i was under the impression that a too specific result is not that good either, aka re-usability of the question + answers itself, so i tried for a compromise which would solve my personal issue, but still be usefull for others which might not have the performance limitations i had. – Dennis Nolte Apr 26 '18 at 15:26

2 Answers2

2
grep 'PATTERN' "$sourcefile" >compressedfile
if [ -s compressedfile ]; then
    gzip -f compressedfile
else
    rm -f compressedfile
fi

The -s test will be true if the given filename exists and if it refers to a file whose size is greater than zero. The file will exist (a redirection always creates the file if it doesn't already exist) and the size will be greater than zero if there was any result from the grep.

The -f flag to gzip forces compression even if the file would grow (which it would do if it's tiny to start with).

The same thing, almost (since it won't compress the grep output if some sort of read/write error occurs for grep), but using the exit status of grep:

if grep 'PATTERN' "$sourcefile" >compressedfile; then
    gzip -f compressedfile
else
    rm -f compressedfile
fi

or just

grep 'PATTERN' "$sourcefile" >compressedfile && gzip -f compressedfile
rm -f compressedfile

Here, rm will try to remove the uncompressed file regardless, but since we're using rm -f, no error will be reported if the file does not exist (it won't exist if gzip has compressed it).


In the most general case, I'd advise against storing the result of grep in a variable as this may return gigabytes of data (we don't know this).

Kusalananda
  • 333,661
1

You could first assign the result of grep to a variable. Then you can check the exit code, as suggested by @Mark in the comments, or check if the result is the empty string, as this:

foo=$(grep $PATTERN $sourcefile)
if [ ! -z "$foo" ]
then
        echo "$foo" | gzip > compressedfile.gz
fi

or, as a one-liner:

foo=$(grep $PATTERN $sourcefile); [ -z "$foo" ] || echo "$foo" | gzip > compressedfile.gz
dr_
  • 29,602
  • Thank you for your input. This was part of an idea i was trying to use too, and helped me that it actually is that simple to echo the output to gzi. It would work fine for our case, however as Kusalananda wrote this might not be the "best" general solution if the grep output gets very large. I will therefor accept his answer. – Dennis Nolte Apr 26 '18 at 11:20
  • Sure, I actually upvoted his solution as I think is better than mine :) – dr_ Apr 26 '18 at 12:13
  • 1
    Heh, you could say that yours is the same as mine (where I use the -s test), but we store the result of grep differently. Yours is perfectly fine if one knows there is not going to be much output from grep and if one is able to juggle $foo without letting the shell poke around in it. – Kusalananda Apr 26 '18 at 12:16