15

I have files in subdirectories of the current directory that may or may not have new lines at the end; how can I find files that don't have a newline at the end?

I've tried this:

find . -name '*.styl' | while read file; do
    awk 'END{print}' $file | grep -E '^$' > /dev/null || echo $file;
done

but it doesn't work. awk 'END{print}' $file prints the line before an empty new line, the same as tail -n 1 $file.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
jcubic
  • 9,932
  • @don_crissti I need files that don't have trailing empty line. – jcubic Oct 12 '16 at 13:48
  • 2
    May I ask the reason you need to find those files? I guess it has to do with the fact that Text files in unix are supposed to be terminated with a newline (vi will "almost silently" add one when you save, for example), and several (text-oriented) commands will ignore the last line if it is not terminated by a newline (wc, iirc .... but there are others). And this may help – Olivier Dulac Oct 12 '16 at 15:24
  • awk 'END{print}' $file : this ignores totally the content of $file, and after finishing parsing all the files contained in "$file" it adds a newline. As it is the only thing that awk command prints, it could be replaced with : printf '\n' (without any mentino of $file at all) and do the same thing. I think this is NOT what you were aiming at (ie: print the last line of the file?) – Olivier Dulac Oct 12 '16 at 15:43
  • @don_crissti: if the last character of a file is not a newline, then that file is not stricly posixly a unix TEXT file. see: http://unix.stackexchange.com/a/263919/27616 . note that many text commands (wc, for example) simply ignore that last "line" if it is not terminated by a newline – Olivier Dulac Oct 12 '16 at 15:44
  • @OlivierDulac: awk 'END{print}' reads each line of the named file(s) or stdin and puts it in $0 and then does nothing else with it, and at the end it prints the value of $0 which is the last line read. But awk discards the input newline (or other RS value/match) and print always adds a newline (or ORS) so this doesn't help you determine if the last line was correctly terminated or not. – dave_thompson_085 Oct 14 '16 at 10:06
  • @dave_thompson_085: sorry, but you are mistaken: the BEGIN block start before reading the file(s) (or stdin), and END begins after the last file (or when stdin sent EOF). these blocks do not parse the lines, and do not save, nor the END {print} print the last line: END{print} is : "after all files are read : print" (so after all lines are read, it execute a simple "print" without any "$0" to display as there are no $0 after the last files are read. Try: printf 'a\nb\nc\n' | awk 'END {print}' : it just outputs a newline. – Olivier Dulac Oct 14 '16 at 12:35
  • 1
    @OlivierDulac: gawk prints c and so does FreeBSD, but I had not noticed it is documented as implementation-dependent: https://www.gnu.org/software/gawk/manual/gawk.html#I_002fO-And-BEGIN_002fEND . So it does happen but not always. – dave_thompson_085 Oct 15 '16 at 17:36

3 Answers3

16

To clarify, the LF (aka \n or newline) character is the line delimiter, it's not the line separator. A line is not finished unless it's terminated by a newline character. A file that only contains a\nb is not a valid text file because it contains characters after the last line. Same for a file that contains only a. A file that contains a\n contains one non-empty line.

So a file that ends with at least one empty line ends with two newline characters or contains a single newline character.

If:

 tail -c 2 file | od -An -vtc

Outputs \n or \n \n, then the file contains at least one trailing empty line. If it outputs nothing, then that's an empty file, if it outputs <anything-but-\0> \n, then it ends in a non-empty line. Anything else, it's not a text file.

Now, to use that to find files that end in an empty line, OK that's efficient (especially for large files) in that it only reads the last two bytes of the files, but first the output is not easily parsable programmatically especially considering that it's not consistent from one implementation of od to the next, and we'd need to run one tail and one od per file.

find . -type f -size +0 -exec gawk '
  ENDFILE{if ($0 == "") print FILENAME}' {} +

(to find files ending in an empty line) would run as few commands as possible but would mean reading the full content of all files.

Ideally, you'd need a shell that can read the end of a file by itself.

With zsh:

zmodload zsh/system
for f (**/*(D.L+0)) {
  {
    sysseek -w end -2
    sysread
    [[ $REPLY = $'\n' || $REPLY = $'\n\n' ]] && print -r -- $f
  } < $f
}
  • a way to use this answer's method to know if some file(s) are text files: are_textfiles () { nontext=0; rem="return 0 if all args are files with terminating newline, or n [=number of non-textfiles]" ; for f in "$@" ; do [ -f "$f" ] && { tail -c 1 "$f" | od -An -vtc | grep "\\n" ;} >/dev/null 2>&1 || ((nontext++)) ; done ; return $nontext ; }. Use as: if ( are_textfiles this that otherthing ) ; then echo all are text files ; else echo "are_textfiles returned : $?" ; fi – Olivier Dulac Oct 14 '16 at 16:19
7

With gnu sed and a shell like zsh (or bash with shopt -s globstar):

sed -ns '${/./F}' ./**/*.styl

this checks if the last line of each file is not empty, if so it prints the filename.
If you want the opposite (print file names if the last line is empty) just replace /./ with /^$/

don_crissti
  • 82,805
3

A correctly terminated text file with an empty last line ends in two \n.

Then, we expect that tail -c2 must be equal to $'\n\n'.

Sadly command expansions remove trailing new lines. We will need a bit of tweaking.

f=filename
nl='
'
t=$(tail -c2 $f; printf x)  # capture the last two characters.
r="${nl}${nl}$"                 # regex for: "ends in two newlines".
[[ ${t%x} =~ $r ]] &&  echo "file $f ends in an empty line"

We could even expand a bit to check which files fail to have a trailing new line:

nl='
'
nl=$'\n'
find . -type f -name '*.styl' | while read f; do
    t=$(tail -c2 $f; printf x); r1="${nl}$"; r2="${nl}${r1}"
    [[ ${t%x} =~ $r1 ]] || echo "file $f is missing a trailing newline"
    [[ ${t%x} =~ $r2 ]] && echo "$f"
done

Note that the newline could be changed to something like $'\r\n if needed.
In that case, also change tail -c2 to tail -c4.