1

I am a bit new to unix and trying to figure out what the following command is doing:

$(grep -w "xyz" prog.R | wc -l) -ge 3

I assume that the outcome would be boolean. The first part (i.e., grep -w "xyz" prog.R) is looking for the text xyz in the prog.R file. But I cannot make out what this part as a whole is doing: $(grep -w "xyz" prog.R | wc -l)

Andy Dalton
  • 13,993
Bishal
  • 11

2 Answers2

3

The grep command extracts all lines in the file called prog.R that contains the word xyz. The -w option makes it only match complete words, so it won't match e.g. xyzz or vxyz.

The wc -l command reads the output from grep via the | pipe, and counts the number of lines that the grep command produces. This will be the output of the command substitution as a whole, i.e. the $(...) bit.

Assuming that the command substitution ($(...)) sits inside [ ... ] or as arguments to the test utility, the -ge 3 then is a test to see whether the number of lines counted by wc -l is greater than or equal to three.

test "$(grep -w "xyz" prog.R | wc -l)" -ge 3

or

[ "$(grep -w "xyz" prog.R | wc -l)" -ge 3 ]

This returns an exit status that could be use by an if statement (as a "boolean", like you say), for example, but it's hard to say much more without seeing more of the code.

Without [ ... ] or test, the command is nonsense.

You may want to check the manuals for the grep, wc and test utilities, and possibly to read up on pipes and command substitutions.


Note my quoting of "$( grep ... )" above. Without the double quotes, the shell would split the result of the command substitution on the characters in $IFS (space, tab, newline, by default), and would then apply filename globbing to the resulting strings. This is something that we don't really want here, especially since we don't know the value of $IFS (if it contains digits, for whatever reason, this may affect the result of the test).

See also:


The grep and wc test could be shortened into just

[ "$(grep -c -w 'xyz' prog.R)" -ge 3 ]

The -c option to grep makes it report the number of lines matched, making the use of wc -l unnecessary.

With GNU grep (and some others), the whole test could made further effective (faster):

[ "$(grep -c -m 3 -w 'xyz' prog.R)" -eq 3 ]

where -m 3 makes grep stop after three matches. With -c, grep then outputs 3 if at least three lines matches the expression, so we test with -eq 3 to see if this was the case.

The -m and -w options to grep are not standard, but often (-w) or sometimes (-m) implemented.

Kusalananda
  • 333,661
  • Having said that, some wc implementations include blanks on either side of the number, and some test/[ implementations choke on it with arithmetic comparison operators. With the default value of $IFS, split+glob would remove those blanks. Other alternative is to use [ "$(( $(grep | wc -l) ))" -ge 3 ] – Stéphane Chazelas Feb 13 '21 at 17:28
  • @StéphaneChazelas Yeah, I'm going to take an executive decision based on the [tag:linux] tag and say that this will not be an issue (zsh and bash handles flanking whitespace). – Kusalananda Feb 13 '21 at 17:30
  • 1
    I went to check Dash and Busybox sh (they ignore whitespace around the number), and noticed that zsh appears to ignore leading whitespace (which just so happens to match what wc -l gives on my Mac). But it croaks on trailing whitespace; zsh -c 'a="3 "; [ "$a" -gt 2 ] && echo hi' gives "zsh:[:1: integer expression expected: 3". Still not a problem on Linux, though since both wc from GNU coreutils and Busybox don't print any extra whitespace. – ilkkachu Feb 13 '21 at 17:57
  • @ilkkachu Thanks. I was not aware. I'll leave the answer as is for the moment. Fiddling with things like these tends to make for overly complex code to handle edge cases, and in this instance the user would not, I feel, be able to quite digest everything. Leaving your comment here though. Thanks again. – Kusalananda Feb 13 '21 at 18:07
  • 1
    @Kusalananda, I know, was just a bit surprised at the result. – ilkkachu Feb 13 '21 at 18:09
0

Your command:

$(grep -w "xyz" prog.R | wc -l) -ge 3

when in the context of a shell test command:

if [ $(grep -w "xyz" prog.R | wc -l) -ge 3 ]; then
    do_something
fi

would be testing whether or not xyz exists as a complete "word" on at least 3 lines in prog.R and setting a zero exit status if so, non-zero otherwise. See @Kuslananadas answer for a breakdown of that command line.

Alternatively you could just use a single awk command, e.g. this in GNU awk:

if awk '/\<wyz\>/ && (++c == 3){ f=1; exit } END{exit !f}' prog.R; then
    do_something
fi

or this in any awk:

if awk '/(^|[^[:alnum:]_])wyz([^[:alnum:]_]|$)/ && (++c == 3){ f=1; exit } END{exit !f}' prog.R; then
    do_something
fi
Ed Morton
  • 31,617
  • Without the [ .. ] or test, it doesn't test anything, probably just gives an error about that (all-numeric) command being not found. – ilkkachu Feb 13 '21 at 21:28
  • 1
    If someone asked what the code x == 2 is doing I think it'd be reasonable to say it's testing whether or not the variable x has the value 2 without adding the caveat that it'd produce a syntax error if just typed as-is on the command-line and IMHO that same logic applies in this case too. But it's good you commented in case anyone was confused by that, thanks. – Ed Morton Feb 13 '21 at 22:12
  • eh? Sure, x == 2 compares x to 2, in programming languages where == is a comparison operator, and x a valid way to refer to a variable. But here, you're presenting two awk commands as alternatives to the $(grep...) -gt 3, even though they don't work the same way. You can't write if [ awk '...' prog.R ]; then... the same way they've probably seen if [ $(grep...) -gt 3 ]; then... being used. And no, it's not immediately obvious to every user posting a question here that [ is a command just like awk is. – ilkkachu Feb 14 '21 at 00:19
  • Fine, I updated it to dot the is. – Ed Morton Feb 14 '21 at 02:09