(OK, sorry I read your question too quickly, so some of my answer is a bit beside the point, still leaving it as it is as it may be useful to you or some)
There are several things to consider here.
quoting of shell variables
Leaving a variable unquoted in POSIX shells (in list contexts, like in arguments to a command), not awk
, is the split+glob operator.
If you do:
cmd foo=$var
Where $var
is * *
.
Tha't asking the shell to split the content of $var
based on the value of the $IFS
special shell variable, by default on blanks. So above, that gives us foo=*
and *
and perform globbing on each of those, that is expand foo=*
to all the filenames in the current directory that start with foo=
and *
to all the non-hidden filenames.
So, really, you should almost always quote your shell variables, whether they are arguments to awk
or not. That also applies to shell command substitution (`...`
and $(...)
) and shell arithmetic expansion ($((...))
).
passing data as-is to awk
The other problem is that awk
(not the shell) expands backslash escape sequences in the assignments of variables like -v var=value
(and with GNU awk
4.2 or above, if the value starts with @/
and ends in /
, it's treated as a regexp type of variable).
For instance, -v var='\n/\n/'
sets the content of the awk
var
variable to <newline>/<newline>/
, not \n/\n/
. That also applies to awk
variables defined as:
awk '...' var=value
To pass data to awk
without it undergoing that expansion, you can use the ENVIRON
or ARGV
awk arrays:
var=$value awk 'BEGIN {var=ENVIRON["var"]} ...'
(above, it's a shell variable assignment (to a non-array variable), so there can't be split+glob, which is one of the rare cases where you can omit the quotes around variables)
or:
awk 'BEGIN {var=ARGV[1]; delete ARGV[1]} ...' "$value"
quoting and awk
variables
That split+glob is only a shell (mis-)feature. The awk
language is a completely different language.
In awk
, variables are refered to a varname
, not $varname
and quotes are used to introduce strings. So "varname"
is the varname
string, while varname
refers to the variable.
sanitizing variables to avoid code injection
Strictly speaking, quoting shell variables is not sanitizing, it's not quoting the variables that is using the split+glob operator. While in most languages you put quotes around fixed strings, in shells, it's the other way round: every thing is string and quotes are used to prevent some special behaviour, and especially variables should almost always be quoted (a poor design decision that kind of made sense in the Bourne shell in the 70s, but is a hindrance in modern shells, zsh
being the only shell that partly fixed that).
The shell or awk will not evaluate/interpret code stored in their own variable unless you tell them to.
var='foo; rm -f var'
echo $var
# or
echo "$var"
Will not cause the content of the variable to be evaluated as shell code (though the first one will undergo splitting and globbing which can have dire consequences (for instance with var='/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*'
). You'd need:
eval "echo $var"
# or
sh -c "echo $var"
for it to be evaluated/interpreted as shell code.
awk
doesn't have such an eval
feature. perl
/python
do.
But beware of cross-contamination. You can have the shell pass variable data (in shell variables) as code to execute by awk
:
awk '{print "'"$var"': " $0}'
would be dangerous in case the $var
shell variable contains for instance:
var='test"; print "foo" > /etc/passwd; print "blah'
because the shell would then execute:
["awk", "{print \"test\"; print \"foo\" > /etc/passwd; print \"blah: \" $0}"]
Or the other way round:
awk '{system("echo foo: " $0)}' < file
where awk
would run a shell as:
["sh", "-c", "echo foo: content-of-the-line"]
for each line of file
(and think of what a line like ; rm -rf /
would do).
It's not only between awk
and sh
. You've got to be careful whenever variable/uncontrolled data may be evaluated as code by another interpreter. Examples are:
sed "s/$regexp/blah/g"
sed
's language is limited but it can still to harm, like with regexp='//;w /etc/passwd; s/
'.
Or:
find . -exec sh -c "echo {}" \;
Now, to avoid those problems, there are two general approaches:
convert the variable from one interpreter to the other. That works for the shell -> awk or find -> sh case above. Like change:
awk '{print "'"$var"': " $0}'
to:
awk -v awk_var="$var" '{print awk_var ": " $0}'
And:
find . -exec sh -c "echo {}" \;
to:
find . -exec sh -c 'echo "$1"' sh {} \;
but that won't work for the shell -> sed, or awk -> shell cases.
when 1 is not possible, you need to sanitize the variables to either remove or escape the characters that may be a problem. In,
awk '{system("echo foo: " $0)}'
you need to convert $0
to something that is a clean string as far as the shell is concerned. One option is to prefix each character with a backslash but that won't work for newline (not a problem here). Another one is to enclose the string in single quotes and escape each single quote.
awk 'function escape(s) {
gsub(/'\''/,"&\\\\&&",s)
return "'\''" s "'\''"
}
{system("echo foo: " escape($0))}'