7

As per an answer on stackoverflow, it's my understanding that encapsulating bash variables in double-quotes is a fairly safe way of sanitizing user input.

What about awk variables? For example, if I have something like:

awk -v SOURCEIP="$SOURCEIP" -v REVERSEDNS="$REVERSEDNS" '{
   gsub(/^_TMPSOURCEIP_/, SOURCEIP);
   gsub(/^_TMPREVERSEDNS_/, REVERSEDNS);
   print
}' /home/foo/footemplate

Should I put quotes around the variable in the gsub lines? So it would then look like:

awk -v SOURCEIP="$SOURCEIP" -v REVERSEDNS="$REVERSEDNS" '{
   gsub(/^_TMPSOURCEIP_/, "SOURCEIP");
   gsub(/^_TMPREVERSEDNS_/, "REVERSEDNS");
   print
}' /home/foo/footemplate

Or does this not make a difference?

Mike B
  • 8,900
  • 4
    Nope, if you'll take SOURCEIP in quotes inside awk, it won't be processed as variable. – rush Feb 06 '14 at 06:57

3 Answers3

5

(OK, sorry I read your question too quickly, so some of my answer is a bit beside the point, still leaving it as it is as it may be useful to you or some)

There are several things to consider here.

quoting of shell variables

Leaving a variable unquoted in POSIX shells (in list contexts, like in arguments to a command), not awk, is the split+glob operator.

If you do:

cmd foo=$var

Where $var is * *.

Tha't asking the shell to split the content of $var based on the value of the $IFS special shell variable, by default on blanks. So above, that gives us foo=* and * and perform globbing on each of those, that is expand foo=* to all the filenames in the current directory that start with foo= and * to all the non-hidden filenames.

So, really, you should almost always quote your shell variables, whether they are arguments to awk or not. That also applies to shell command substitution (`...` and $(...)) and shell arithmetic expansion ($((...))).

passing data as-is to awk

The other problem is that awk (not the shell) expands backslash escape sequences in the assignments of variables like -v var=value (and with GNU awk 4.2 or above, if the value starts with @/ and ends in /, it's treated as a regexp type of variable).

For instance, -v var='\n/\n/' sets the content of the awk var variable to <newline>/<newline>/, not \n/\n/. That also applies to awk variables defined as:

awk '...' var=value

To pass data to awk without it undergoing that expansion, you can use the ENVIRON or ARGV awk arrays:

var=$value awk 'BEGIN {var=ENVIRON["var"]} ...'

(above, it's a shell variable assignment (to a non-array variable), so there can't be split+glob, which is one of the rare cases where you can omit the quotes around variables)

or:

awk 'BEGIN {var=ARGV[1]; delete ARGV[1]} ...' "$value"

quoting and awk variables

That split+glob is only a shell (mis-)feature. The awk language is a completely different language.

In awk, variables are refered to a varname, not $varname and quotes are used to introduce strings. So "varname" is the varname string, while varname refers to the variable.

sanitizing variables to avoid code injection

Strictly speaking, quoting shell variables is not sanitizing, it's not quoting the variables that is using the split+glob operator. While in most languages you put quotes around fixed strings, in shells, it's the other way round: every thing is string and quotes are used to prevent some special behaviour, and especially variables should almost always be quoted (a poor design decision that kind of made sense in the Bourne shell in the 70s, but is a hindrance in modern shells, zsh being the only shell that partly fixed that).

The shell or awk will not evaluate/interpret code stored in their own variable unless you tell them to.

var='foo; rm -f var'
echo $var
# or
echo "$var"

Will not cause the content of the variable to be evaluated as shell code (though the first one will undergo splitting and globbing which can have dire consequences (for instance with var='/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*'). You'd need:

eval "echo $var"
# or
sh -c "echo $var"

for it to be evaluated/interpreted as shell code.

awk doesn't have such an eval feature. perl/python do.

But beware of cross-contamination. You can have the shell pass variable data (in shell variables) as code to execute by awk:

awk '{print "'"$var"': " $0}'

would be dangerous in case the $var shell variable contains for instance:

var='test"; print "foo" > /etc/passwd; print "blah'

because the shell would then execute:

["awk", "{print \"test\"; print \"foo\" > /etc/passwd; print \"blah: \" $0}"]

Or the other way round:

awk '{system("echo foo: " $0)}' < file

where awk would run a shell as:

["sh", "-c", "echo foo: content-of-the-line"]

for each line of file (and think of what a line like ; rm -rf / would do).

It's not only between awk and sh. You've got to be careful whenever variable/uncontrolled data may be evaluated as code by another interpreter. Examples are:

sed "s/$regexp/blah/g"

sed's language is limited but it can still to harm, like with regexp='//;w /etc/passwd; s/'.

Or:

find . -exec sh -c "echo {}" \;

Now, to avoid those problems, there are two general approaches:

  1. convert the variable from one interpreter to the other. That works for the shell -> awk or find -> sh case above. Like change:

    awk '{print "'"$var"': " $0}'
    

    to:

    awk -v awk_var="$var" '{print awk_var ": " $0}'
    

    And:

    find . -exec sh -c "echo {}" \;
    

    to:

    find . -exec sh -c 'echo "$1"' sh {} \;
    

    but that won't work for the shell -> sed, or awk -> shell cases.

  2. when 1 is not possible, you need to sanitize the variables to either remove or escape the characters that may be a problem. In,

    awk '{system("echo foo: " $0)}'
    

    you need to convert $0 to something that is a clean string as far as the shell is concerned. One option is to prefix each character with a backslash but that won't work for newline (not a problem here). Another one is to enclose the string in single quotes and escape each single quote.

    awk 'function escape(s) {
           gsub(/'\''/,"&\\\\&&",s)
           return "'\''" s "'\''"
         }
         {system("echo foo: " escape($0))}'
    
  • Thanks, that's great information. I'm still a little confused on the "safety" of passing stuff around like that. In my case, I DO want it to expand but I don't want it to wreak havoc. For the purpose of discussion, let's say that the value of the shell variable $SOURCEIP is rm -fr /. If I pass that to awk via awk -v AWKVAREXAMPLE="$SOURCEIP" and then later have awk do a gsub like gsub(/^_TARGETSTRING_/, AWKVAREXAMPLE); would that eventually "leak" out into the shell and destroy everything? – Mike B Feb 06 '14 at 08:16
  • 1
    @MikeB, no. It would leak out to the shell if awk invoked a shell and passed that as a code for it to interpret like in: awk '{system("echo " var)}' (where var is ;rm -rf /), where awk calls ["sh", "-c", "echo; rm -rf /"] or awk '{print | "tr " v1 " " v2}' where awk pipes the output to ["sh", "-c", "tr content-of-v1 content-of-v2"]. – Stéphane Chazelas Feb 06 '14 at 08:41
  • 1
    Things that you want to avoid are like: awk "{print \"$shell_variables\"}" as there, the content of the shell variable is interpreted as awk code. – Stéphane Chazelas Feb 06 '14 at 08:43
5

These two examples demonstrate the difference:

$ echo _TMP_ | awk -v VAR='some "text"' '{ gsub(/_TMP_/, VAR) ; print }'
some "text"
$ echo _TMP_ | awk -v VAR='some "text"' '{ gsub(/_TMP_/, "VAR") ; print }'
VAR

When VAR is unquoted, awk treats it as a variable with the value some "text". When VAR is inside quotes, awk treats it as a three-character string.

MORE: bash has sanitizing issues. Consider:

$ VAR="rm important_file" ; $VAR

The above will erase important_file. In this way, bash is like a macro language: it will substitute for a variable and then try to execute the result. awk is different. Consider:

$ echo _TMP_ | awk -v VAR='var); print $1' '{ gsub(/_TMP_/, VAR) ; print }'
var); print $1

awk treats VAR like mere text, not like potential commands to execute.

Problems can arise, however, if one lets bash modify the awk script. In my examples above, the awk scripts were all in single-quotes. That prevents bash from messing with them.

John1024
  • 74,655
  • 1
    VAR='blah; echo $1' is not a problem to the shell either (unless you use eval). It's not a macro language (except to some extent wrt alias expansion) – Stéphane Chazelas Feb 06 '14 at 09:28
0

If you are passing an Awk variable to system, you need to shell quote it:

function quote(str,   d, m, x, y, z) {
  d = "\47"; m = split(str, x, d)
  for (y in x) z = z d x[y] d (y < m ? "\\" d : "")
  return z
}

Example:

system(sprintf("ffmpeg -i %s outfile.m4a", quote(ARGV[1])))

Source

Zombo
  • 1
  • 5
  • 44
  • 63