Passing a variable from a FOR loop into awk to grab particular word

Question

I am trying to print in a CSV certain words from a table within a TXT file.

{...some code...}
number_lines=$(awk 'END { print NR }' Table1.txt
if [$number_lines -gt 5]
then
    for ((i=5; i<$number_lines; i++))
    do
       word=$(awk 'FNR==$i {print $2}' Table1.txt)
       echo $word
       printf "$variable1\t$variable2\t$variable3\t$word\n" >> Table2.csv
    done
fi

I thought I could get the word in line i $2 If I use FNR==5 {print $2} I would get what I want, but because I don't know how many words there will be in Table1.txt I need something to go from line 5 (because previous lines are not required) until the end of Table1.txt -1 line from the end. I hope my poor code won't make anyone upset, I had to do this in a rush and never done something in bash before, therefore apologies.

Please [edit] your question to include about 8-10 lines of sample input and the expected output given that input so we can best help you. Make sure to include some truly representative text for those $variables (especially if they can contain escape sequences like \t) as well as the contents of Table1.txt and Table2.csv — Ed Morton, Jul 18 '21 at 12:31

score 2 · Accepted Answer · answered Jul 18 '21 at 08:52

You can sneak shell variables into awk variables using the -v option.

Your awk command would look like:

awk -v Seq="$i" 'FNR==Seq {print $2}' Table1.txt

Having proposed that fix, it would be faster and perhaps clearer to replace all 10 lines by a single awk program, which would avoid reading Table1 for every line it contains. awk is rather good at counting lines and reading data.

Not tested, but replacing everything after "some code" with something like:

awk -v Vars="${variable1}\t${variable2}\t${variable3}\t" \
    'FNR >= 5 { printf ("%s\n%s%s\n", $2, Vars, $2); }' \
    Table1.txt > Table2.csv

Thanks for the quick reply. I have tried the option one, and that gives the result I was looking for. I am going to try the second one which indeed looks better, and will post if that also gives the same result. — Tony, Jul 18 '21 at 09:57

score 0 · Answer 2 · answered Jul 18 '21 at 09:55

You don't want to be running awk repeatedly in a loop like that, it's going to be reading and processing the entire file multiple times (line count - 4 times).

Ideally, it would be best to do the entire thing in awk (or perl or any language that isn't shell), but I don't know what's in your $variable[123] vars or how they're defined (btw, you should probably use an array for that if you're going to do it in bash), so I'll just show how to replace the for loop with a while read loop.

while read r word ; do
  echo "$word"
  printf "$variable1\t$variable2\t$variable3\t$word\n" >> Table2.csv
done < <(awk 'NR > 4 {print $2}')

This still isn't great (it's never a good idea to use shell itself for text processing), but at least it only runs awk once and only reads the input file once.

Ed Morton · Answer 3 · 2021-07-18T12:48:57.543

You should be doing that in a single call to awk, not calling awk repeatedly in a shell loop as that'll be extremely slow and is hard to write the code robustly. If you post some concise, testable sample input and expected output then we can help you more but it sounds like this might be what you're trying to do:

awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

For example:

$ variable1='this stuff'
$ variable2='other stuff'
$ variable3='last stuff'
$ cat Table1.txt
01      the     foo
02      quick   bar
03      brown   foo
04      fox     bar
05      jumped  foo
06      over    bar
07      the     foo
08      lazy    bar
09      dogs    foo
10      back    bar

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

$ cat Table2.csv
this stuff      other stuff     last stuff      jumped
this stuff      other stuff     last stuff      over
this stuff      other stuff     last stuff      the
this stuff      other stuff     last stuff      lazy
this stuff      other stuff     last stuff      dogs

If any of those $variables can contain escape sequences that you don't want expanded (e.g. \t to a literal tab char), then do this instead:

vars="$variable1"$'\t'"$variable2"$'\t'"$variable3" awk '
    BEGIN { vars=ENVIRON["vars"]; OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

See how-do-i-use-shell-variables-in-an-awk-script for more information on how to pass the value of shell variables to an awk script.

To address that echo $word in your shell script. If that's a debugging print then it should really go to stderr instead of stdout (i.e. it should have been written as echo "$word" >&2) and then your awk script would be:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev | "cat>&2"   # or print prev > "/dev/stderr" if your awk supports that
        print vars, prev
    }
    { prev = $2 }
' Table1.txt > Table2.csv

but if you REALLY want it to go to stdout then you could do this:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev
        print vars, prev > "Table2.csv"
    }
    { prev = $2 }
' Table1.txt

or:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev
        print vars, prev | "cat>&3"
    }
    { prev = $2 }
' Table1.txt 3> "Table2.csv"

Passing a variable from a FOR loop into awk to grab particular word

3 Answers3