7

Is there any difference between these three code blocks in bash?

Using IFS= :

#!/usr/bin/env bash
while IFS= read -r item; do
    echo "[$item]"
done </dev/stdin

Using IFS=$'\n':

#!/usr/bin/env bash
while IFS=$'\n' read -r item; do
    echo "[$item]"
done </dev/stdin

Using -d $'\n':

#!/usr/bin/env bash
while read -rd $'\n' item; do
    echo "[$item]"
done </dev/stdin

If there are differences between the two IFS values and the -d deliminator alternative, then under which circumstances would the differences present themselves?

From my testing, they all appear the same:

echo $'one two\nthree\tfour' | test-stdin 
# outputs:
# [one two]
# [three    four]
balupton
  • 471
  • Explained under "Word Splitting" in man bash. – choroba Nov 10 '21 at 09:57
  • 2
    Relevant URL from @choroba's comment: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Word-Splitting

    If some code samples could be provided, that illustrate what the documentation is trying to communicate, that would be appreciated. For foolish ol' me, the documentation is too obtuse without illustration.

    – balupton Nov 10 '21 at 10:06

2 Answers2

11

IFS= and IFS=$'\n' are identical when it comes to read (assuming the read delimiter is not changed from the default), since the only difference is whether a newline inside a line separates words, but a newline never appears inside a line.

read and read -d $'\n' are identical since $'\n' (newline) is the default delimiter.

IFS= and IFS=$'\n' makes a difference for field splitting: IFS= completely turns off field splitting, whereas IFS=$'\n' splits on newlines.

IFS=$'\n'
echo $(echo a; echo b)
# prints "a b" on a single line since $'a\nb' is split at 
# the newline and therefore echo receives two arguments "a" and "b"
IFS=
echo $(echo a; echo b)
# prints "a" and "b" on separate lines $'a\nb' is passed 
# as a single argument to echo
Jim L.
  • 7,997
  • 1
  • 13
  • 27
  • Note that IFS=$'\n' can cause trouble if the script runs under a shell that doesn't support the $' ' ANSI-C quoting mode. It'll work fine under bash, zsh, ksh, etc, but if the script is run under dash or something like that, IFS will consist of the dollar sign, backslash, and "n" characters, which can cause really weird effects. – Gordon Davisson Nov 10 '21 at 19:18
3

Combining the excellent mentioned resources from @Giles's answer, @choroba's comment, and the answers from another question. I've put together the following code examples to illustrate the differences:


IFS (aka Internal Field Separator) specifies the inline delimiters (multiple characters are accepted, order is irrelevant). It defaults to IFS=$' \t\n'. It is only relevant if read is given multiple variable targets.

read's -d argument specifies the line delimiter (only the first character is accepted). It defaults to -d $'\n'.

As such,

# IFS=, -d $'\n', with tab separated fields, across two lines
echo $'a\tb\tc\nz\tx\ty' | while IFS= read -rd $'\t' a b c; do echo "[$a] [$b] [$c]"; done
# [a] [] []
# [b] [] []
# [c
# z] [] []
# [x] [] []

IFS=tab, with tab separated fields, across two lines

echo $'a\tb\tc\nz\tx\ty' | while IFS=$'\t' read -r a b c; do echo "[$a] [$b] [$c]"; done

[a] [b] [c]

[z] [x] [y]

IFS=tab, with tab separated fields, across two lines, with only a single variable target

echo $'a\tb\tc\nz\tx\ty' | while IFS=$'\t' read -r a; do echo "[$a]"; done

[a b c]

[z x y]

IFS=tab, with space and tab separated fields, across two lines

echo $'a b\tc\nz\tx y' | while IFS=$'\t' read -r a b c; do echo "[$a] [$b] [$c]"; done

[a b] [c] []

[z] [x y] []

IFS=tab+space, with space and tab separated fields, across two lines

echo $'a b\tc\nz\tx y' | while IFS=$'\t ' read -r a b c; do echo "[$a] [$b] [$c]"; done

[a] [b] [c]

[z] [x] [y]

IFS=newline, -d '', with space and tab separated fields, across two lines

echo $'a b\tc\nz\tx y' | while IFS=$'\n' read -rd '' a b c; do echo "[$a] [$b] [$c]"; done

outputs nothing, as no delimiter means no lines for inline splitting

IFS=newline, -d '', with space and tab separated fields, across two lines, with trailing null character

printf 'a b\tc\nz\tx y\0' | while IFS=$'\n' read -rd '' a b c; do echo "[$a] [$b] [$c]"; done

outputs a single line, with two newline separated fields:

[a b c] [z x y] []

IFS=newline, -d $'\0', with space and tab separated fields, across two lines, with trailing null character

printf 'a b\tc\nz\tx y\0' | while IFS=$'\n' read -rd $'\0' a b c; do echo "[$a] [$b] [$c]"; done

outputs a single line, with two newline separated fields:

[a b c] [z x y] []

As such,

  • IFS splits "fields" across a "line", it is an "inline" splitter
  • -d splits "lines", it is a "line" splitter
  • customise IFS to customise what separates "fields"
  • customise -d to customise what separates "lines"

One use case where -d is valuable, is reading each field individually, in a specific order:

echo $'a b\tc\nz\tx y' | {
    read -rd ' ' a
    echo "a=[$a]"
    read -rd $'\t' b
    echo "b=[$b]"
    read -rd $'\n' c
    echo "c=[$c]"
    read -rd $'\t' z
    echo "z=[$z]"
    read -rd $' ' x
    echo "x=[$x]"
    read -rd $'\n' y
    echo "y=[$y]"
}
# a=[a]
# b=[b]
# c=[c]
# z=[z]
# x=[x]
# y=[y]

As such,

  • IFS is only necessary to be defined iff your read call accepts multiple variable targets.
  • If your read call only accepts a single variable argument, IFS is discarded, which means that IFS= in such cases only serves a cosmetic function.

@Giles's answer covers IFS outside the context of read.

Such a use case could be selecting a filename from a directory that contains two files, one with a space inside it, and one without:

cd "$(mktemp -d)" || exit 1
touch 'before-space after-space.txt'
touch 'no-space.txt'

using arrays

results in correct fields for selection

mapfile -t list < <(ls -1) select node in "${list[@]}"; do echo "via mapfile, [$node]" break done echo

outputs:

1) before-space after-space.txt

2) no-space.txt

#? 1

via mapfile, [before-space after-space.txt]

using word splitting with default IFS

results in mangled fields for selection

select node in $(ls -1); do echo "IFS=default [$node]" break done echo

outputs:

1) before-space

2) after-space.txt

3) no-space.txt

#? 1

IFS=default [before-space]

using word splitting with IFS=$'\n'

results in the correct fields for selection

IFS=$'\n' select node in $(ls -1); do echo "IFS=newline [$node]" break done echo

outputs:

1) before-space after-space.txt

2) no-space.txt

#? 1

IFS=newline [before-space after-space.txt]

using word splitting with IFS=

results in a jumbled field for selection

IFS= select node in $(ls -1); do echo "IFS= [$node]" break done echo

outputs:

1) before-space after-space.txt

no-space.txt

#? 1

IFS= [before-space after-space.txt

no-space.txt]

balupton
  • 471
  • 1
    in that last loop, you're right that read will exit with a failure since it doesn't see the NUL delimiter. But you could add that to the input by using printf 'a b\tc\nz\tx y\0' instead of the echo (Can't use $'\0' since most shells can't handle the NUL in an expansion. But within printf it works.) Or, if you ignore the return value of read, by using e.g. echo $'a b\tc\nz\tx y' | ( IFS=$'\n' read -rd '' a b c; echo "[$a] [$b] [$c]" ) instead of the while loop. Either gives [a b c] [z x y] [], showing how the input is split on the newlines. – ilkkachu Nov 10 '21 at 11:30
  • @ilkkachu thank you! I've incorporated your feedback into the examples. – balupton Nov 10 '21 at 12:04
  • 2
    AFAIU, IFS can affect leading/trailing chars (even if only reading into a single variable). e.g. read <<<" foo " -r var && echo "<$var>" prints <foo> – rowboat Nov 10 '21 at 12:10
  • 1
    @rowboat, yes, for whitespace separators (, , ). But non-whitespace separators don't get that, IFS=: read <<<"::foo::" -r var && echo "<$var>" prints <::foo::>. – ilkkachu Nov 10 '21 at 12:23
  • 1
    read -rd $'\0' is exactly the same as read -rd ''. The \0 in $'' does create a NUL byte, but the very next moment Bash takes that NUL as ending the string, since that's how C-style strings work. I think I've heard that read -d '' using the NUL as delimiter was something of an accident to begin with, the implementation just used the first byte, which in that case was the terminating NUL. You can see the same with e.g. echo $'foo\0bar' is the same as echo 'foo' since the NUL terminates the string. – ilkkachu Nov 10 '21 at 12:27
  • The whitespace vs non-whitespace trimming is quite peculiar. – balupton Nov 10 '21 at 13:38
  • The whitepace-trimming effect is the reason that many people use IFS= read -r ... as the standard "basic" read command (with -r to avoid backslash weirdness). It's the "just give me what you read, don't mess with it" invocation. – Gordon Davisson Nov 10 '21 at 19:14
  • @ilkkachu Even a non-whitespace char as IFS can affect trimming, as shown in Stéphane's answer – rowboat Nov 11 '21 at 01:44
  • @rowboat, oh, right... IFS=: read <<< "foo:" -r var && echo "<$var>" gives <foo>. sigh. – ilkkachu Nov 11 '21 at 06:29