3

I am trying to pass a variable number of arguments from the shell script to a pattern recognition subset of a table. Here is my attempt so far:

The file 'infile':

    ID,GROUP
    1,GROUP2    
    2,GROUP2    
    3,GROUP4    
    4,GROUP4    
    5,GROUP5    
    6,GROUP5    
    7,GROUP23   
    8,GROUP23   
    9,GROUP23   

The file subset.sh:

    #!/bin/sh
    rm -f outfile_$week

    week = $1
    shift

    for TOKEN in "$@"
    do

    echo "adding records for" $TOKEN

    awk -F "," -v group = $TOKEN '{ if(FNR > 2 && $2 ~/group/){print $0} }' infile >> outfile_$week
    done

I have also tried group = "$TOKEN", "group = $TOKEN" and then both with single quotes. I am submitting like this:

    sh subset.sh 061314 GROUP2 GROUP23

The error I get is an astoundingly uninformative

    Usage: awk [-F fs][-v Assignment][-f Progfile|Program][Assignment|File] ...

Any help is much appreciated, thanks!

EDIT: I tried running

    awk -F "," -v group ="GROUP1" '{ if(FNR > 2 && $2 ~/group/){print $0} }' infile

to no avail... (same error as above) anyone know of any reason this might happen?

mlegge
  • 283
  • 1
    what's the rm -f infile doing inside subset.sh? – iruvar Jun 13 '14 at 18:24
  • 1
    Not related to your actual question, but afaik $2 ~/group/ will do a regex match against literal string group. To match against the variable you need $2 ~ group. In fact the whole expression could be much simpler 'FNR > 2 && $2 ~ group' since print $0 is the default action if the test evaluates true. – steeldriver Jun 13 '14 at 18:38
  • @1_CR that's a typo, I was changing the names for simplicity. – mlegge Jun 13 '14 at 19:37
  • You need to not put any spaces around the =. That's what is wrong with everything you have tried. Gnouc's answer is the correct way. – Graeme Jun 13 '14 at 19:37

3 Answers3

4

You should write:

-v group="$TOKEN"

instead of -v group = $TOKEN, which causes syntax error in awk.

cuonglm
  • 153,898
2

Sounds like you want:

awk -F, '
  BEGIN {
    for (i = 1; i < ARGC; i++) group[ARGV[i]]
    ARGC=0
  }
  NR >= 2 && $2 in group' "$@" < infile

Or if you really want to consider the arguments as regexps to match against the second column:

awk -F, '
  BEGIN {
    for (i = 1; i < ARGC; i++) group[ARGV[i]]
    ARGC=0
  }
  NR >= 2 {
    for (i in group) if ($2 ~ i) {print; next}
  }' "$@" < infile
2

Your immediate problem is the spaces around the equal sign. The argument to the -v option should be an assignment. Awk sees an argument to -v, followed by a script (=), followed by file names (the value of TOKEN, your script, and your file names).

You made a similar error in the shell script further up: week = $1 should be week="$1".

By the way, always put double quotes around command substitutions. For example, if TOKEN is *, it would be replaced by the list of files in the current directory.

awk -v "group=$TOKEN"

This doesn't set group to the value of TOKEN, though, because awk treats the right-hand side of the assignment as a literal in awk syntax. For example, if the value of TOKEN is the 7-character string foo\bar, then the awk variable group is set to the 6-character string foo␈ar where is a backspace character (byte value 8).

The straightforward way to pass a variable to an awk script is to export it to the environment, and use it via the ENVIRON array.

In addition, you aren't using the variable group anywhere in the awk script. The regexp /group/ matches any string containing the 5-character string group. If you want to check whether the field is exactly the value of group (so that e.g. if the value of TOKEN is GROUP2 then a field containing GROUP24 won't be matched), use the equality operator ==.

  export TOKEN
  awk -F "," '{ if (FNR > 2 && $2 == ENVIRON["TOKEN"]){print $0} }' infile >> outfile_$week

Here's the whole script, simplified a little further to use awk's condition-action syntax (where the action is omitted here since print $0 is the default) and to avoid opening the output file every time:

#!/bin/sh
week="$1"
shift
for TOKEN in "$@"
do
  echo "adding records for" $TOKEN
  awk -F "," 'FNR > 2 && $2 == ENVIRON["TOKEN"]' infile 
done >"outfile_$week"

See Stéphane Chazelas's answer for a more advanced way to use awk that doesn't require processing the input file multiple times.