7

I have this following awk script that takes the following input file, input.txt and produces the output below. Can someone please take the time to break down how this awk script works? I've spent a bit of time on it and it's not making a whole lot of sense.


Input:

$ cat input.txt

FINISHED
RSYNCJOBNA
20140502 0021 2182096 2082096 6 5
2014820905820902 10:02:15
2014820905820902 10:56:42
0:54:27

INITIATED
RSYNCJOBNA
20140502 0022 3282096 3182096 6 5
2014820905820902 15:31:06
0:06:04 ce eque**

Output:

RSYNCJOBNA|0021|20140502|10:02:15|10:56:42|0:54:27|FINISHED
RSYNCJOBNA|0022|20140502|15:31:06|        |0:06:04|INITIATED

Command to get the above ouput:

awk -v OFS='|' '/FINISHED|INITIATED/ {
        status = $1; getline;
        jobname = $1; getline;
        sequence = $2; date = $1; getline;
        start = $2; getline;
        if (status == "FINISHED") { end = $2; getline } else { end = "        " }
        runtime = $1;
        print jobname, sequence, date, start, end, runtime, status;
    }' input.txt

My understanding is that /FINISHED|INITIATED/ {} means that the commands inside the curly braces will only be run on lines matching either FINISHED or INITIATED but as far as I can tell from the output, the script seems to be parsing from all lines. What's going on?

terdon
  • 242,166
Avinash Raj
  • 3,703
  • 3
    What exactly don't you understand? Nothing? In that case: Shall we read the awk manual to you? Otherwise: Be precise. Data is read into variables and output in different order. – Hauke Laging May 03 '14 at 07:45
  • 2
    I don't know what the getline function in the above command does. And also if we give the pattern like this /FINISHED|INITIATED/, awk searches for the corresponding line and do the operation only on that particular line.But the operation was performed on all the lines. How? – Avinash Raj May 03 '14 at 08:25
  • 2
    @HaukeLaging What Data is read into variables and output in different order line means? – Avinash Raj May 03 '14 at 08:26

2 Answers2

18

The getline function reads the next line and moves the script to it. So, consecutive getline calls move to the next line. This is perhaps easier to understand with an example:

$ cat input.txt
foo
1
2
$ awk '/foo/{print; getline; print; getline; print}' input.txt
foo
1
2

As you can see above, the script will process the first line because it matches foo. Each call to getline will read the line after the current one, so the subsequent print calls are printing the next lines.

terdon
  • 242,166
  • But $1 $2 etc reflect the current line still, so the script hasnt really moved to the next line entirely, right? – Gregg Leventhal Oct 24 '22 at 17:09
  • @GreggLeventhal no, $1, $2 etc have also changed. Try with $ seq 10 | awk '/5/{getline; print $1}' and you will see that $1 is 6 not 5. – terdon Oct 24 '22 at 17:17
  • Ah I see, but if you do getline var, the behavior is different it seems. – Gregg Leventhal Oct 24 '22 at 17:29
  • @GreggLeventhal I don't really know what you mean by that. You might want to ask a new question. I don't know much about the internals of awk. – terdon Oct 24 '22 at 17:30
  • I mean if you run awk ' { getline var }' instead of setting $0 and everything to the input, it just sets the variable var to the line and $1 $2 etc remain unchanged. You would have to use split(var, arr, " "); arr[1], arr[2], etc to re-tokenize var on spaces – Gregg Leventhal Oct 24 '22 at 18:02
4

If you don't know what an awk function does then the usual strategy is to have a look at the man page:

getline

Set $0 from next input record; set NF, NR, FNR, RT

The command block is indeed executed twice only. The other lines are handled via getline from within the block.

This could be rewritten to:

/FINISHED|INITIATED/ { status = $1; line_number=0; next; }
{ line_number++; }
line_number==1 { jobname = $1; }
line_number==2 { sequence = $2; date = $1; }
...
Hauke Laging
  • 90,279
  • What happens if the next record is an empty one? – Avinash Raj May 03 '14 at 08:52
  • @AvinashRaj Neither your code nor my alternative looks at the content of the lines (with the exception of /FINISHED|INITIATED/). The lines are just counted down. The data must be arranged exactly (from an awk parsing perspective) in the way you have shown otherwise the code will break. – Hauke Laging May 03 '14 at 09:00