To clarify, the LF (aka \n
or newline) character is the line delimiter, it's not the line separator. A line is not finished unless it's terminated by a newline character. A file that only contains a\nb
is not a valid text file because it contains characters after the last line. Same for a file that contains only a
. A file that contains a\n
contains one non-empty line.
So a file that ends with at least one empty line ends with two newline characters or contains a single newline character.
If:
tail -c 2 file | od -An -vtc
Outputs \n
or \n \n
, then the file contains at least one trailing empty line. If it outputs nothing, then that's an empty file, if it outputs <anything-but-\0> \n
, then it ends in a non-empty line. Anything else, it's not a text file.
Now, to use that to find files that end in an empty line, OK that's efficient (especially for large files) in that it only reads the last two bytes of the files, but first the output is not easily parsable programmatically especially considering that it's not consistent from one implementation of od
to the next, and we'd need to run one tail
and one od
per file.
find . -type f -size +0 -exec gawk '
ENDFILE{if ($0 == "") print FILENAME}' {} +
(to find files ending in an empty line) would run as few commands as possible but would mean reading the full content of all files.
Ideally, you'd need a shell that can read the end of a file by itself.
With zsh
:
zmodload zsh/system
for f (**/*(D.L+0)) {
{
sysseek -w end -2
sysread
[[ $REPLY = $'\n' || $REPLY = $'\n\n' ]] && print -r -- $f
} < $f
}
awk 'END{print}' $file
: this ignores totally the content of $file, and after finishing parsing all the files contained in "$file" it adds a newline. As it is the only thing that awk command prints, it could be replaced with :printf '\n'
(without any mentino of $file at all) and do the same thing. I think this is NOT what you were aiming at (ie: print the last line of the file?) – Olivier Dulac Oct 12 '16 at 15:43awk 'END{print}'
reads each line of the named file(s) or stdin and puts it in$0
and then does nothing else with it, and at the end it prints the value of$0
which is the last line read. But awk discards the input newline (or other RS value/match) andprint
always adds a newline (or ORS) so this doesn't help you determine if the last line was correctly terminated or not. – dave_thompson_085 Oct 14 '16 at 10:06printf 'a\nb\nc\n' | awk 'END {print}'
: it just outputs a newline. – Olivier Dulac Oct 14 '16 at 12:35c
and so does FreeBSD, but I had not noticed it is documented as implementation-dependent: https://www.gnu.org/software/gawk/manual/gawk.html#I_002fO-And-BEGIN_002fEND . So it does happen but not always. – dave_thompson_085 Oct 15 '16 at 17:36