0

I am trying to print in a CSV certain words from a table within a TXT file.

{...some code...}
number_lines=$(awk 'END { print NR }' Table1.txt
if [$number_lines -gt 5]
then
    for ((i=5; i<$number_lines; i++))
    do
       word=$(awk 'FNR==$i {print $2}' Table1.txt)
       echo $word
       printf "$variable1\t$variable2\t$variable3\t$word\n" >> Table2.csv
    done
fi

I thought I could get the word in line i $2 If I use FNR==5 {print $2} I would get what I want, but because I don't know how many words there will be in Table1.txt I need something to go from line 5 (because previous lines are not required) until the end of Table1.txt -1 line from the end. I hope my poor code won't make anyone upset, I had to do this in a rush and never done something in bash before, therefore apologies.

Tony
  • 13
  • 2
    Please [edit] your question to include about 8-10 lines of sample input and the expected output given that input so we can best help you. Make sure to include some truly representative text for those $variables (especially if they can contain escape sequences like \t) as well as the contents of Table1.txt and Table2.csv – Ed Morton Jul 18 '21 at 12:31

3 Answers3

2

You can sneak shell variables into awk variables using the -v option.

Your awk command would look like:

awk -v Seq="$i" 'FNR==Seq {print $2}' Table1.txt

Having proposed that fix, it would be faster and perhaps clearer to replace all 10 lines by a single awk program, which would avoid reading Table1 for every line it contains. awk is rather good at counting lines and reading data.

Not tested, but replacing everything after "some code" with something like:

awk -v Vars="${variable1}\t${variable2}\t${variable3}\t" \
    'FNR >= 5 { printf ("%s\n%s%s\n", $2, Vars, $2); }' \
    Table1.txt > Table2.csv
Paul_Pedant
  • 8,679
  • Thanks for the quick reply. I have tried the option one, and that gives the result I was looking for. I am going to try the second one which indeed looks better, and will post if that also gives the same result. – Tony Jul 18 '21 at 09:57
0

You don't want to be running awk repeatedly in a loop like that, it's going to be reading and processing the entire file multiple times (line count - 4 times).

Ideally, it would be best to do the entire thing in awk (or perl or any language that isn't shell), but I don't know what's in your $variable[123] vars or how they're defined (btw, you should probably use an array for that if you're going to do it in bash), so I'll just show how to replace the for loop with a while read loop.

while read r word ; do
  echo "$word"
  printf "$variable1\t$variable2\t$variable3\t$word\n" >> Table2.csv
done < <(awk 'NR > 4 {print $2}')

This still isn't great (it's never a good idea to use shell itself for text processing), but at least it only runs awk once and only reads the input file once.

cas
  • 78,579
0

You should be doing that in a single call to awk, not calling awk repeatedly in a shell loop as that'll be extremely slow and is hard to write the code robustly. If you post some concise, testable sample input and expected output then we can help you more but it sounds like this might be what you're trying to do:

awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

For example:

$ variable1='this stuff'
$ variable2='other stuff'
$ variable3='last stuff'

$ cat Table1.txt 01 the foo 02 quick bar 03 brown foo 04 fox bar 05 jumped foo 06 over bar 07 the foo 08 lazy bar 09 dogs foo 10 back bar

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

$ cat Table2.csv
this stuff      other stuff     last stuff      jumped
this stuff      other stuff     last stuff      over
this stuff      other stuff     last stuff      the
this stuff      other stuff     last stuff      lazy
this stuff      other stuff     last stuff      dogs

If any of those $variables can contain escape sequences that you don't want expanded (e.g. \t to a literal tab char), then do this instead:

vars="$variable1"$'\t'"$variable2"$'\t'"$variable3" awk '
    BEGIN { vars=ENVIRON["vars"]; OFS="\t" }
    NR>5 { print vars, prev }
    { prev = $2 }
' Table1.txt > Table2.csv

See how-do-i-use-shell-variables-in-an-awk-script for more information on how to pass the value of shell variables to an awk script.

To address that echo $word in your shell script. If that's a debugging print then it should really go to stderr instead of stdout (i.e. it should have been written as echo "$word" >&2) and then your awk script would be:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev | "cat>&2"   # or print prev > "/dev/stderr" if your awk supports that
        print vars, prev
    }
    { prev = $2 }
' Table1.txt > Table2.csv

but if you REALLY want it to go to stdout then you could do this:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev
        print vars, prev > "Table2.csv"
    }
    { prev = $2 }
' Table1.txt

or:

$ awk -v vars="$variable1\t$variable2\t$variable3" '
    BEGIN { OFS="\t" }
    NR>5 {
        print prev
        print vars, prev | "cat>&3"
    }
    { prev = $2 }
' Table1.txt 3> "Table2.csv"
Ed Morton
  • 31,617