14

Here is my script (to find the files that contain a specified pattern):

find . -type f \
    -exec awk -v vawk="$1" '/'"$vawk"'/ {c++} c>0 { print ARGV[1]; exit 0 } END { if (! c) {exit 1}}' \{\} \;

I would like to use my script with an argument §:

MyScript.sh pattern

My problem is that I don't manage to put the $1 variable in awk.

When I try to debug my script

bash -x MyScript.sh pattern

Here is the output :

+ find . -type f -exec awk -v vawk=pattern '// {c++} c>0 {print ARGV[1] ; exit 0 } END { if (! c) {exit 1}}' '{}' ';'

The $vawk variable seems to be empty.

Any idea?

Kusalananda
  • 333,661
Nicolas
  • 419
  • 4
  • 14

2 Answers2

13

Reproduced from this now closed as duplicate question as it includes warnings on the limitations of awk variable passing which one might find useful.

A shell variable is just that: a shell variable. If you want to turn it into a awk variable, you need a syntax such as:

awk -v x="$x" '$2 == x {print $1}' infile

or

awk '$2 == x {print $1}' x="$x" infile

However, those suffer from a problem: escape sequences are expanded in them.

Also, with GNU awk 4.2 or above, if $x starts with @/ and ends in /, it's treated as a regexp type of variable).

So, for instance if the shell variable contains the two characters backslash and n, the awk variable will end up containing the newline character and with gawk 4.2+, if it contains @/foo/, the awk variable will contain foo and be of type regexp. Worse, if it's @/(xxxxx){1,20000}/ for instance, gawk will hog one CPU for hours or until memory exhaustion trying to compile that regexp, making it some form of DoS vulnerability.

Another approach (but which like for -v requires a POSIX awk or nawk (as opposed to the 1970's awk still found as /bin/awk in Solaris)) is to use environment variables:

x="$x" awk '$2 == ENVIRON["x"] {print $1}' infile

Another approach (still with newer awks) is to use the ARGV array in awk:

awk -- 'BEGIN {x = ARGV[1]; delete ARGV[1]}
  $2 == x {print $1}' "$x" infile

Also beware that whether you use ARGV/ENVIRON/-v or var=value arguments, the corresponding string will be considered as a numeric string if it's shaped like a number (with the range of recognised number formats varying with the implementation).

It's important, because in that $2 == ENVIRON["VAR"] above for instance, it will be a string comparison¹ if $VAR is for instance foo or 1f2, but a numeric comparison if it's 1e2 or 1.1 (or possibly inf, 0xff depending on the awk implementation and version), assuming $2 also looks numeric. So 10.0e1, 100 and 1e2 would all be considered equal.

Doing:

awk 'BEGIN {var = "" ENVIRON["VAR"]}'

Would make sure the var awk variable is always considered as a string, even if the $VAR shell variable looks like a number.

awk 'BEGIN {var = 0 + ENVIRON["VAR"]}'

Would convert it to a number (at least the leading part of it that can be interpreted as a number).


¹ or strcoll() comparison with some implementations (as used to be required by POSIX), that is, a == b where either a or b or both are a string would return true if a and b have same sorting order.

12

You seem to be confusing awk variables and shell variables. awk -v vawk="$1" creates an awk variable called vawk, yet you are trying to use shell syntax ($vawk). This doesn't work because the shell doesn't have a variable called vawk. I think what you want is

awk -v vawk="$1" '$0 ~ vawk { c++ } # ...'
#                      ^ awk variable syntax
jw013
  • 51,212
  • 3
    Note that awk expands the C-like escape sequences in $1, so that approach doesn't work if $1 may contain backslash characters (common for regexps). You may use the ENVIRON awk special array instead. – Stéphane Chazelas Apr 18 '17 at 12:27
  • In GNU awk 4.2+, that's also some form of DoS vulnerability (try for instance with $1 being @/x*{1,20000}/) – Stéphane Chazelas Jun 24 '23 at 22:31