4

I read another answer that describes how to use AWK to view the last line of output:

$ seq 42 | awk 'END { print }'
42

So it seems like when the END block is run the last line is loaded in $0.

This surprised me because the first line isn't loaded into the BEGIN block:

$ seq 42 | awk 'BEGIN { print }'
#=> blank
  • Is this behavior documentation anywhere? (I searched through the man page but didn't find anything)
mbigras
  • 3,100

2 Answers2

9

The BEGIN block is run before any input is processed, so $0 hasn’t been initialised yet.

The END block doesn’t do anything to $0, which keeps its last value. In your AWK script, that’s just the last line read, because AWK reads all its input line by line, does its usual field-splitting processing (assigning $0 and so on), but never finds a matching block; but for example

seq 42 | awk '{ $0 = "21" } END { print }'

outputs 21, not 42, so it’s not the case that “when the END block is run the last line is loaded in $0”.

This isn’t documented in the gawk(1) manpage, but it is documented in mawk(1) (for that implementation of AWK obviously):

Similarly, on entry to the END actions, $0, the fields and NF have their value unaltered from the last record.

The GNU AWK manual does mention this behaviour:

In fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in END rules.

“BWK awk” is Brian Kernighan’s awk, the “one true awk; it implemented this behaviour in 2005, as documented in its FIXES file:

Apr 24, 2005: modified lib.c so that values of $0 et al are preserved in the END block, apparently as required by posix. thanks to havard eidnes for the report and code.

That change is visible in the “one true awk” history. The latest release of BWK awk behaves in the same way as GNU AWK:

$ echo three fields here | ./awk '{ $0 = "one" } END { print $0 " " NF }'
one 1
$ echo three fields here | ./awk 'END { $0 = "one"; print $0 " " NF }'
one 1
Stephen Kitt
  • 434,908
4

According to the GNU awk manual, it's slightly unclear what $0 should contain in an END rule. POSIX demands that NF "shall retain [its] value"(*), but doesn't mention $0.

Most probably due to an oversight, the standard does not say that $0 is also preserved, although logically one would think that it should be. In fact, all of BWK awk, mawk, and gawk preserve the value of $0 for use in END rules. Be aware, however, that some other implementations and many older versions of Unix awk do not.

In a sense, I find this behaviour logical. Leaving $0 for the END block allows for easy access to the last record, if necessary. The first record is easy to access with NR == 1 {...} so doesn't need a special keyword. On the other hand, executing BEGIN blocks before loading the first record allows setting FS or RS in time for them to be active for the first record.

(* Whatever that means, see comments.)

ilkkachu
  • 138,973
  • POSIX says “Inside an END action, NF shall retain the value it had for the last record read”, which might be subject to interpretation. GNU AWK at least just keeps the value from the end of the last block that was processed; seq 42 | awk '{ $0 = "2 1" } END { print NF }' outputs “2”. – Stephen Kitt Apr 06 '17 at 21:30
  • @StephenKitt, well, yes... assigning to $0 updates NF, so I'm not sure what the difference here is. – ilkkachu Apr 06 '17 at 21:37
  • When you say “POSIX demands that NF contain the number of fields in the last record”, I understand that as meaning that NF must contain the number of fields in the last record as it was read, not the number of fields in the last value assigned to $0. The same applies for POSIX, even more so since “read” is explicit: does it mean the value assigned when the last record was read, or the value NF had when the last record was finished processing? – Stephen Kitt Apr 06 '17 at 21:41
  • echo three fields here | awk '{ $0 = "one" } END { print $0 " " NF }' #=> one 3 seems like NF doesn't get recalculated – mbigras Apr 06 '17 at 22:30
  • @mbigras What version of AWK are you using? gawk prints “one 1” on my system. – Stephen Kitt Apr 07 '17 at 04:31
  • You could just about argue that in the case of $0 = "1 2 3" the line was read from a string. Maybe. Possibly. At a pinch. Weasel words perhaps? – Chris Davies Apr 07 '17 at 07:54
  • The awk that came with my Mac (awk version 20070501 whatever the origin) has curious behaviour, it seems to recalculate NF only if it's referenced on the record $0 was changed. echo three fields here | awk '{$0 = "one"} END {print $0 " " NF}' gives one 3, but | awk '{$0 = "one"; NF} END {print $0 " " NF}' gives one 1... gawk and mawk on Debian give a 1 in all situations. – ilkkachu Apr 07 '17 at 07:55
  • @roaima, well, the description of NF says that it "shall retain the value it had", which would to me imply 'remaining the same' or 'not changing'. One hopes that if the intended meaning was 'shall be reset to the value it had at point X' they would have chosen another word to use. – ilkkachu Apr 07 '17 at 08:03
  • @ikkachu, yes I'm with you on this one. It feels to me that POSIX may have unintentionally misdefined the behaviour of $0 in an END block. (But who am I to say.) – Chris Davies Apr 07 '17 at 10:27
  • @StephenKitt awk version 20070501 on macOS 10.12.3 – mbigras Apr 09 '17 at 21:20