By definition, a text file consists of a sequence of lines. A line ends with a newline character. Thus a text file ends with a newline character, unless it's empty.
The read
builtin is only meant to read text files. You aren't passing a text file, so you can't hope it to work seamlessly. The shell reads all the lines — what it's skipping are the extra characters after the last line.
If you have a potentially malformed input file that may be missing its last line, you could add a newline to it, just to be sure.
{ cat "/tmp/urlFile"; echo; } | …
Files that should be text files but are missing the final newline are often produced by Windows editors. This usually goes in combination with Windows line endings, which are CR LF, as opposed to Unix's LF. CR characters are rarely useful anywhere, and can't appear in URLs in any case, so you should remove them.
{ <"/tmp/urlFile" tr -d '\r'; echo; } | …
In case the input file is well-formed and does end with a newline, the echo
adds an extra blank line. Since URLs can't be empty, just ignore blank lines.
Note also that read
does not read lines in a straightforward way. It ignores leading and trailing whitespace, which for a URL is probably desirable. It treats backslash at the end of a line as an escape character, causing the next line to be joined with the first minus the backslash-newline sequence, which is definitely not desirable. So you should pass the -r
option to read
. It is very, very rare for read
to be the right thing rather than read -r
.
{ <"/tmp/urlFile" tr -d '\r'; echo; } | while read -r url
do
if [ -z "$url" ]; then continue; fi
…
done
awk 1 /tmp/urlFile
.. soawk 1 /tmp/urlFile | while ...
– muru Jan 19 '18 at 05:03