4

The POSIX wc command counts how many POSIX lines in a file. The POSIX standard defines a line as a text string with the suffix \n. Without \n, a pure text string can't be called a line.

But to me, it's more natural to count how many lines of text string in a file. Is there an easy way to do that?

root:[~]# printf "aa\nbb" | wc -l
1
root:[~]# printf "aa\nbb\n" | wc -l
2
root:[~]#

2 Answers2

7

With GNU sed, you can use:

sed '$=;d'

As GNU sed does consider those extra characters after the last newline as an extra line. GNU sed like most GNU utilities also supports NUL characters in its input and doesn't have a limitation on the length of lines (the two other criteria that make an input non-text as per POSIX).

POSIXLy, building-up on @Inian's answer to support too-long lines and NUL bytes:

LC_ALL=C tr -cs '\n' '[x*]' | awk 'END {print NR}'

That tr command translates all sequences of one or more character (each byte interpreted as a character in the C locale to avoid decoding issues) other than newline to one x character, so awk input records will be either 0 or 1 byte long and its input contain only x and newline characters.

$ printf '%10000s\na\0b\nc\nd' | wc -l
3

$ printf '%10000s\na\0b\nc\nd' | mawk 'END{print NR}'
2
$ printf '%10000s\na\0b\nc\nd' | busybox awk 'END{print NR}'
5
$ printf '%10000s\na\0b\nc\nd' | gawk 'END{print NR}'
4

$ printf '%10000s\na\0b\nc\nd' | LC_ALL=C tr -cs '\n' '[x*]' | mawk 'END{print NR}'
4
  • Under which conditions will tr -cs '\n' 'x' fail ? –  Dec 06 '19 at 23:44
  • @Isaac, while tr -cs '\n' 'x' would also work with the tr of GNU or some BSDs, it is not POSIX as POSIX leaves the behaviour unspecified when the second set (here x) is shorter than the first (here the complement of \n). It won't work in SysV-derived tr implementations for instance. [x*] means as many x as necessary to fill-up the set. – Stéphane Chazelas Dec 07 '19 at 08:08
4

You can use awk for this which has a special variable NR which tracks the number of current record from the start of the file. The variable gets incremented at the end of each line. When printed at the END block i.e. after all the input lines are processed it prints the number of the last record processed.

printf "aa\nbb" | awk 'END { print NR }'
2

printf "aa\nbb\n" | awk 'END { print NR }'
2
Inian
  • 12,807
  • 2
    Note that with some awk implementations, that still implies the input doesn't contain NUL characters (which would also make that input non-text as per POSIX). – Stéphane Chazelas Aug 13 '19 at 08:15