Using grep in a variable in a loop

Question

I'm having trouble storing a grep result as a variable in a loop.

while read file;do
  Server=$(echo $file | awk '{ print $1 }')
  FDate=$(echo $file | awk '{ print $2 }')
  ST=$(cat foobar | grep $Server | awk '{ print $3 }')
  #ST=$(grep $Server foobar | awk '{ print $3 }')
echo "Server = $Server"
  echo "FDate = $FDate"
  echo "ST = $ST"
done < inputfile

The first ST var gives the output "Usage: grep [Option]... Pattern [File]" for each iteration which means its not reading the command correctly.

The second ST var that is commented out actually breaks the entire script cause all the other variables to be empty when it tries to echo.

Now when I try doing the same thing on the command line it works:

$ testme=$(cat foobar | grep Big | awk '{ print $3}'
$ echo "$testme"
tada

So my question is how do I store that grep command in the variable? The pattern match has only one possible result so I don't have to worry about multiple matches. But each server in the loop might have a different string in column 3 (tada,tada1,tada2)

EDIT:

The inputfile has a list of servers with multiple columns. I'm taking the server listed in column 1 of that current line and searching the foobar file for a match and getting the string from column 3.

I've found that the script actually does work even though it's giving the 'Usage' message. Probably because some of the server entries in the inputfile aren't yet in the foobar file so grep doesn't have a match but still tried to pipe it to awk. I don't know that for certain.

I'd still like to eliminate the 'Usage' messages though. I think maybe a 'set -o pipefail' might work but I'd rather not do that.

Admittedly, I don't understand what your remark about the second ST assignment means. In my opinion, the second assignment does the same as the first, just in a more elegant way. Most probably, the Server variable is not set, or is set to a value that confuses grep (e.g., a value that contains white space). Therefore, it would be interesting to know the output of echo "Server = $Server". — berndbausch, Sep 01 '21 at 03:21
Oddly enough when I used 'set -x' it ran without showing any errors. I thought that was odd so I ran without and let the script complete. I was doing a 'ctrl c' when I was getting the 'Usage' messege because I was sure it would fail (the actual script takes a while), but the resulting output in the end actually was correct irregardless of the error messege. Thank you! I'll just send stderr to /dev/null — Jeight, Sep 01 '21 at 03:24
@berndbausch - It has me stumped as well but I can assure you it breaks using the second ST assignment. — Jeight, Sep 01 '21 at 03:26
It does not seem safe to rely on the problem "magically going away" by ignoring the error message. One point which you should always do is quote your shell variables - that will make quite the difference when your $Server variable is empty, e.g. Also, you don't need to cat a file o grep and pipe the result to awk - awk can do all that on its own. If you would provide example input with desired output, someone might come up with a more efficient solution ... — AdminBee, Sep 01 '21 at 07:22
I suspect that inputfile contains some empty lines, so the unquoted $server expands to nothing — steeldriver, Sep 01 '21 at 11:18
@steeldriver - There are no empty lines, but there are lines that contain servers that won't match the foobar file grep (later that won't be the case). — Jeight, Sep 01 '21 at 15:53
@Jeight, for cases like this, a complete example really needs to also have an input file that exhibits the issue. Even with mocked up data, but so that the issue can be tested. You'll better answers even, since then people can point to you the actual place where the problem happens. — ilkkachu, Sep 01 '21 at 17:53

ilkkachu · Answer 1 · 2021-09-01T17:43:22.120

while read file;do                                       # 1
  Server=$(echo $file | awk '{ print $1 }')              # 2
  FDate=$(echo $file | awk '{ print $2 }')               # 3
  ST=$(cat foobar | grep $Server | awk '{ print $3 }')   # 4
  #ST=$(grep $Server foobar | awk '{ print $3 }')        # 5

grep needs at least the pattern to search for (or an -e or -f option providing the equivalent), so if $Server ends up being empty, then the unquoted $Server on lines 4 and 5 disappear during word splitting (see also When is double-quoting necessary?), and

the grep on line 4 gets no arguments. Without the mandatory argument, it prints the usage description.
the grep on line 5 gets the single argument foobar, which it takes as a pattern. By default it reads from standard input, and inside the loop, it has the same stdin as the loop, so eats everything from there.

Now, the whole loop reminds me of this question: Why is using a shell loop to process text considered bad practice? and it could be simplified at least somewhat. read can split the input on fields itself, so we can remove the command substitutions.

Then, we should probably deal with the case where one or both of the values happen to be empty. And, since awk can do the job of grep, too, let's do that:

while read server fdate; do
    if [ -z "$server" ] || [ -z "$fdate" ]; do
        continue
    fi
    ST=$(awk < foobar -v server="$server" '$0 ~ server { print $3 }')
echo &quot;server $server fdate $fdate ST $ST&quot;

done < inputfile

(or, depending on what you're intending to do in the end, replace the whole thing with an awk program.)

cas · Answer 2 · 2021-09-01T18:26:33.147

Don't do this with grep and/or awk in a shell while loop (or any commands like that in a shell loop, for that matter). See Why is using a shell loop to process text considered bad practice? for reasons why. In short: shell is great at getting other programs to do work, setting up redirections and pipelines and feeding filenames and data into other programs that actually do the work, but it is terrible at doing that work itself. Shell is slow and prone to user errors like not double-quoting your variables (e.g. your failure to quote $Server as "$Server" is the direct cause of your problem with grep...one of the causes, anyway. The other cause is your failure to check whether $Server actually contained any value after the awk)

Anyway, everything you need can be done with one short awk script. For example:

awk 'NR==FNR { fdates[$1] = $2 ; next}; # read first file into fdates array
 $1 in fdates {  # process second file
   printf &quot;Server = %s\nFDate = %s\nST = %s\n&quot;, $1, fdates[$1], $3;
 }' inputfile foobar

In English:

Read the first file, store it in an associative array called fdates, with the key being field 1 (Server) and the value being field 2 (FDate)
When we're reading the second file (and any subsequent files), if the server name is a key in fdates, then print out the details you wanted.

NOTE: You haven't specified which field you expect to find the server name in in file foobar. The script above assumes that it is in field 1 of foobar. If it's in a different field, change the $1 in two lines (the $1 in fdates line and the printf line) to suit.

If the server names could be anywhere on a line in foobar (i.e. there is no fixed field number to match on) then you could write the script like this:

awk 'NR==FNR { fdates[$1] = $2 ; next};
     { for (server in fdates) {
       if ($0 ~ server) {
         printf "Server = %s\nFDate = %s\nST = %s\n", server, fdates[server], $3;
       }
     }' inputfile foobar

In English:

Read the first file, store it in an associative array called fdates, with the key being field 1 (Server) and the value being field 2 (FDate) - i.e. same as the first version.
Then, for each and every line of subsequent file(s), iterate over every element of fdates array. If there's a regex match with the key (server name) anywhere on the line, then print your required details.

The second version will be a bit slower than the first because it has to do a regex match for every server name contained in the fdates array for each line of foobar.

Both versions will be orders of magnitude faster than your while read loop around 3 calls to awk, and 1 call each to cat and grep - reading the entirety of file foobar on each pass through the loop. Either of the awk scripts above is only called once, and they only have to read inputfile and foobar once.

Mmh, that "don't run any commands in a shell loop" seems a bit too harsh. Running commands is what the shell is for, so if the end goal is to run some program based on the input, the shell seems just the right tool. Of course, that doesn't appear to be the case here. — ilkkachu, Sep 01 '21 at 17:51
Running shell built-ins in a while or for loop is OK, i.e. when the shell can process whatever data it has without repeatedly calling external programs on the same data or data files. But a loop that repeatedly runs grep, awk, sed, cut, or whatever else over the same data file(s) is about as anti-optimal as you can get - it's like putting on a pair of lead-weighted shoes for a "quick" run to the local shop for milk and bread, it won't be quick (and you'll probably stumble and break your neck). — cas, Sep 01 '21 at 17:59
I mean that something like while read foo bar; do somecmd "$foo" -x "$bar"; done < somefile or the relatively common for f in *.foo; do mv -- "$f" "${f%.foo}.bar"; done seem sensible enough. Both could be done in awk but it'd amount to the same thing, just wrapped in system(). (The latter could of course be done better in Perl.) — ilkkachu, Sep 01 '21 at 18:08
Yes, obviously, some things are OK to do in a shell loop, or make sense to do so. Processing text is not one of them. The somecmd loop in your first example is an example of "process[ing] whatever data it has without repeatedly calling external programs on the same data or files" - i.e. you have a list of filenames or a set of data elements and you want to feed them to another command one at a time..no problem, that's one of the things shell is for. Your second example (with mv) is better done with rename (the perl version of rename, specifically - rename 's/\.foo$/.bar/' *.foo). — cas, Sep 01 '21 at 18:16

Using grep in a variable in a loop

2 Answers2