This has come up several times already on this site — see Understanding IFS and the linked questions. In this answer, I'm going to summarize what can go wrong and how to avoid it; see the linked threads for details.
read line
performs the following actions:
- Read from standard input up to the first byte that is either a newline or null, and put the data in the variable called
line
.
- Strip off any backslash that is not at the end of the line. A double backslash
\\
becomes a single backslash. In other words, backslash quotes the next character as long as it isn't a newline.
- If
read
stopped at a newline and the character at the end of the line is a \
, strip the backslash-newline sequence and continue reading, appending to the variable line
. Repeat until the first of: a newline that is not preceded by a backslash; a null byte; the end of the input.
- Strip the longest suffix of
line
that is made of characters in $IFS
. By default, IFS
contains a tab, a space and a newline, so this strips ASCII whitespace from the end of the value of line
.
- Strip the longest prefix of
line
that is made of whitespace characters in $IFS
.
For example, if the input is
: hello\
world: :
wibble
then read line
results in line
containing : helloworld: :
(no initial space) with the default value of IFS
. If IFS
has been changed to :
(just a colon) then read line
results in : helloworld:
(with a space at the beginning and at the end). If IFS
contains both :
and a space then the result is : helloworld
(no initial or trailing space).
To avoid the influence of IFS
, set it to an empty value (note that this is different from unsetting it). You can set it only for the read
command by writing IFS= read
(see Why is `while IFS= read` used so often, instead of `IFS=; while read..`?).
To avoid backslash processing, pass the -r
option to read
.
Unless the shell is zsh, if there is a null byte in the input, then subsequent characters are lost. Shells are not designed to read binary data.
Thus the idiom for reading one line at a time is:
while IFS= read -r line; do
… # process "$line"
end
When you use the variable line
, make sure to always put double quotes around variable substitutions: "$line"
. Without double quotes, the shell first expands the value of the variable, then it breaks that value into separate words wherever it contains characters from IFS
, and every word is interpreted as a wildcard pattern and replaces by the list of matching files (if there are no matching files, the pattern is left as is). So echo 'a* b*' | IFS= read -r line; echo $line
expands to the list of files in the current directory beginning with a
or b
; to get the input unchanged, use echo 'a* b*' | IFS= read -r line; echo "$line"
.
Note also that the echo
command sometimes modifies the string it prints. The exact way depends on the shell. Some shells process backslash escapes, and some shells recognize options. Using echo
to output a string verbatim is only sure to work you know that the string does not contain any backslash and does not start with a dash (-
). A reliable and portable way of printing a string as is is
printf '%s\n' "$line"
This prints a newline after the string, like echo
. You can omit the newline by omitting \n
in the command above.