35

I have a file named /tmp/urlFile where each line represents a url. I am trying to read from the file as follows:

cat "/tmp/urlFile" | while read url
do
    echo $url
done

If the last line doesn't end with a newline character, that line won't be read. I was wondering why?

Is it possible to read all the lines, regardless if they are ended with a new line or not?

Tim
  • 101,790

7 Answers7

32

You'd do:

while IFS= read -r url || [ -n "$url" ]; do
  printf '%s\n' "$url"
done < url.list

(effectively, that loop adds back the missing newline on the last (non-)line).

See also:

  • Thanks. I read the linked articles, and maybe I miss something, why "that loop adds back the missing newline on the last (non-)line"? – Tim Jan 18 '18 at 18:26
  • 1
    @Tim What Stephane seems to mean is that it adds back the missing newline in the output since all printf calls here have \n . – Sergiy Kolodyazhnyy Jan 18 '18 at 22:53
  • This is really clever. If the input does not end in a newline, then it must be the last/only line, so as long as it's not completely empty, we still want to process it. I suspect this could also work with read -r url || : instead, but then (I think) you'd always get an empty string at the very end of a file that does in fact end in \n. – shadowtalker Oct 06 '23 at 17:02
  • 1
    @shadowtalker read -r url || : is always true, so the loop would never end and you'd keep printing empty lines after the end of the input is reached. – Stéphane Chazelas Oct 06 '23 at 18:20
  • Of course, that's silly of me. – shadowtalker Oct 06 '23 at 18:27
8

Well, read returns a falsy value if it meets end-of-file before a newline, but even if it does, it still assigns the value it read. So, we can check if the final call of read returns something else than an empty line, and process it as normal. So, only exit the loop after read returns false and the line is empty:

#!/bin/sh
while IFS= read -r line || [ "$line" ]; do 
    echo "line: $line"
done

$ printf 'foo\nbar' | sh ./read.sh 
line: foo
line: bar
$ printf 'foo\nbar\n' | sh ./read.sh 
line: foo
line: bar
ilkkachu
  • 138,973
7

By definition, a text file consists of a sequence of lines. A line ends with a newline character. Thus a text file ends with a newline character, unless it's empty.

The read builtin is only meant to read text files. You aren't passing a text file, so you can't hope it to work seamlessly. The shell reads all the lines — what it's skipping are the extra characters after the last line.

If you have a potentially malformed input file that may be missing its last line, you could add a newline to it, just to be sure.

{ cat "/tmp/urlFile"; echo; } | …

Files that should be text files but are missing the final newline are often produced by Windows editors. This usually goes in combination with Windows line endings, which are CR LF, as opposed to Unix's LF. CR characters are rarely useful anywhere, and can't appear in URLs in any case, so you should remove them.

{ <"/tmp/urlFile" tr -d '\r'; echo; } | …

In case the input file is well-formed and does end with a newline, the echo adds an extra blank line. Since URLs can't be empty, just ignore blank lines.

Note also that read does not read lines in a straightforward way. It ignores leading and trailing whitespace, which for a URL is probably desirable. It treats backslash at the end of a line as an escape character, causing the next line to be joined with the first minus the backslash-newline sequence, which is definitely not desirable. So you should pass the -r option to read. It is very, very rare for read to be the right thing rather than read -r.

{ <"/tmp/urlFile" tr -d '\r'; echo; } | while read -r url
do
  if [ -z "$url" ]; then continue; fi
  …
done
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
6

This seems to be solved in part with readarray -t:

readarray -t urls "/tmp/urlFile"
for url in "${urls[@]}"; do
    printf '%s\n' "$url"
done

Note however that while this does work for reasonably-sized files, this solution introduces a potential new problem with very large files - it first reads the file into an array which then must be iterated through. For very large files this could be both time- and memory-consuming, potentially to the point of failure.

DopeGhoti
  • 76,081
2

Another way would be like this :

When read reaches end-of-file instead of end-of-line, it does read in the data and assign it to the variables, but it exits with a non-zero status. If your loop is constructed "while read ;do stuff ;done

So instead of testing the read exit status directly, test a flag, and have the read command set that flag from within the loop body. That way regardless of reads exit status, the entire loop body runs, because read was just one of the list of commands in the loop like any other, not a deciding factor of if the loop will get run at all.

DONE=false
until $DONE ;do
read || DONE=true
echo $REPLY 
done < /tmp/urlFile

Referred from here.

1
cat "/tmp/urlFile" | while read url
do
    echo $url
done

This is a Useless Use of cat.

Ironically, you can replace the cat process here with something actually useful: a tool that POSIX systems have for adding the missing newline, and making the file into a proper POSIX text file.

sed -e '$a\' "/tmp/urlFile" | while read -r url
do
    printf "%s\n" "${url}"
done

Further reading

JdeBP
  • 68,745
  • 1
    The behaviour of sed is unspecified by POSIX when the input doesn't end in a newline character though; also when there are lines larger than LINE_MAX, while the behaviour of read is specified in those cases. – Stéphane Chazelas Jan 19 '18 at 17:36
1

To read all the lines, regardless if they are ended with a new line or not:

cat "/tmp/urlFile" | { cat ; echo ; } | while read url; do echo $url; done

Source : My open source project https://sourceforge.net/projects/command-output-to-html-table/