10

I create a file with tab-delimited fields.

echo foo$'\t'bar$'\t'baz$'\n'foo$'\t'bar$'\t'baz > input

I have the following script named zsh.sh

#!/usr/bin/env zsh
while read line; do
    <<<$line cut -f 2
done < "$1"

I test it.

$ ./zsh.sh input
bar
bar

This works fine. However, when I change the first line to invoke bash instead, it fails.

$ ./bash.sh input
foo bar baz
foo bar baz

Why does this fail with bash and work with zsh?

Additional troubleshooting

  • Using direct paths in the shebang instead of env produces the same behaviour.
  • Piping with echo instead of using the here-string <<<$line also produces the same behaviour. i.e. echo $line | cut -f 2.
  • Using awk instead of cut works for both shells. i.e. <<<$line awk '{print $2}'.
Sparhawk
  • 19,941
  • 4
    By the way, you can make your test file more simply by doing one of these: echo -e 'foo\tbar\tbaz\n...', echo $'foo\tbar\tbaz\n...', or printf 'foo\tbar\tbaz\n...\n' or variations of these. It saves you from having to individually wrap each tab or newline. – Dennis Williamson Jun 10 '16 at 12:55

4 Answers4

17

That's because in <<< $line, bash versions prior to 4.4 did word splitting, (though not globbing) on $line when not quoted there and then joined the resulting words with the space character (and put that in a temporary file followed by a newline character and make that the stdin of cut).

$ a=a,b,,c bash-4.3 -c 'IFS=","; sed -n l <<< $a'
a b  c$

tab happens to be in the default value of $IFS:

$ a=$'a\tb'  bash-4.3 -c 'sed -n l <<< $a'
a b$

The solution with bash is to quote the variable.

$ a=$'a\tb' bash -c 'sed -n l <<< "$a"'
a\tb$

Note that it's the only shell that does that. zsh (where <<< comes from, inspired by Byron Rakitzis's implementation of rc), ksh93, mksh and yash which also support <<< don't do it.

When it comes to arrays, mksh, yash and zsh join on the first character of $IFS, bash and ksh93 on space.

$ mksh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ yash -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ ksh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1 2$
$ zsh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ bash -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1 2$

There's a difference between zsh/yash and mksh (version R52 at least) when $IFS is empty:

$ mksh -c 'a=(1 2); IFS=; sed -n l <<< "${a[@]}"'
1 2$
$ zsh -c 'a=(1 2); IFS=; sed -n l <<< "${a[@]}"'
12$

The behaviour is more consistent across shells when you use "${a[*]}" (except that mksh still has a bug when $IFS is empty).

In echo $line | ..., that's the usual split+glob operator in all Bourne-like shells but zsh (and the usual problems associated with echo).

  • 1
    Excellent answer! Thank you (+1). I'll accept the lowest rep'ed questioner though, since they answered the question perfectly well enough to reveal my stupidity. – Sparhawk Jun 10 '16 at 12:07
13

What happens is that bash replaces the tabs with spaces. You can avoid this problem by saying "$line" instead, or by explicitly cutting on spaces.

10

The problem is that you're not quoting $line. To investigate, change the two scripts so they simply print $line:

#!/usr/bin/env bash
while read line; do
    echo $line
done < "$1"

and

#!/usr/bin/env zsh
while read line; do
    echo $line
done < "$1"

Now, compare their output:

$ bash.sh input 
foo bar baz
foo bar baz
$ zsh.sh input 
foo    bar    baz
foo    bar    baz

As you can see, because you're not quoting $line, the tabs aren't interpreted correctly by bash. Zsh seems to deal with that better. Now, cut uses \t as the field delimiter by default. Therefore, since your bash script is eating the tabs (because of the split+glob operator), cut only sees one field and acts accordingly. What you are really running is:

$ echo "foo bar baz" | cut -f 2
foo bar baz

So, to get your script to work as expected in both shells, quote your variable:

while read line; do
    <<<"$line" cut -f 2
done < "$1"

Then, both produce the same output:

$ bash.sh input 
bar
bar
$ zsh.sh input 
bar
bar
terdon
  • 242,166
  • Excellent answer! Thank you (+1). I'll accept the lowest rep'ed questioner though, since they answered the question perfectly well enough to reveal my stupidity. – Sparhawk Jun 10 '16 at 12:07
  • ^vote for being the only answer (as yet) to actually include the corrected bash.sh – lauir Jun 12 '16 at 03:58
1

As has already been answered, a more portable way to use a variable is to quote it:

$ printf '%s\t%s\t%s\n' foo bar baz
foo    bar    baz
$ l="$(printf '%s\t%s\t%s\n' foo bar baz)"
$ <<<$l     sed -n l
foo bar baz$

$ <<<"$l"   sed -n l
foo\tbar\tbaz$

There is a difference of implementation in bash, with the line:

l="$(printf '%s\t%s\t%s\n' foo bar baz)"; <<<$l  sed -n l

This is the result of most shells:

/bin/sh         : foo bar baz$
/bin/b43sh      : foo bar baz$
/bin/bash       : foo bar baz$
/bin/b44sh      : foo\tbar\tbaz$
/bin/y2sh       : foo\tbar\tbaz$
/bin/ksh        : foo\tbar\tbaz$
/bin/ksh93      : foo\tbar\tbaz$
/bin/lksh       : foo\tbar\tbaz$
/bin/mksh       : foo\tbar\tbaz$
/bin/mksh-static: foo\tbar\tbaz$
/usr/bin/ksh    : foo\tbar\tbaz$
/bin/zsh        : foo\tbar\tbaz$
/bin/zsh4       : foo\tbar\tbaz$

Only bash split the variable on the right of <<< when unquoted.
However, that has been corrected on bash version 4.4
That means that the value of $IFS affects the result of <<<.


With the line:

l=(1 2 3); IFS=:; sed -n l <<<"${l[*]}"

All shells use the first character of IFS to join values.

/bin/y2sh       : 1:2:3$
/bin/sh         : 1:2:3$
/bin/b43sh      : 1:2:3$
/bin/b44sh      : 1:2:3$
/bin/bash       : 1:2:3$
/bin/ksh        : 1:2:3$
/bin/ksh93      : 1:2:3$
/bin/lksh       : 1:2:3$
/bin/mksh       : 1:2:3$
/bin/zsh        : 1:2:3$
/bin/zsh4       : 1:2:3$

With "${l[@]}", an space is needed to separate the different arguments, but some shells choose to use the value from IFS (Is that correct?).

/bin/y2sh       : 1:2:3$
/bin/sh         : 1 2 3$
/bin/b43sh      : 1 2 3$
/bin/b44sh      : 1 2 3$
/bin/bash       : 1 2 3$
/bin/ksh        : 1 2 3$
/bin/ksh93      : 1 2 3$
/bin/lksh       : 1:2:3$
/bin/mksh       : 1:2:3$
/bin/zsh        : 1:2:3$
/bin/zsh4       : 1:2:3$

With a null IFS, the values should become joined, as with this line:

a=(1 2 3); IFS=''; sed -n l <<<"${a[*]}"

/bin/y2sh       : 123$
/bin/sh         : 123$
/bin/b43sh      : 123$
/bin/b44sh      : 123$
/bin/bash       : 123$
/bin/ksh        : 123$
/bin/ksh93      : 123$
/bin/lksh       : 1 2 3$
/bin/mksh       : 1 2 3$
/bin/zsh        : 123$
/bin/zsh4       : 123$

But both lksh and mksh fail to do so.

If we change to a list of arguments:

l=(1 2 3); IFS=''; sed -n l <<<"${l[@]}"

/bin/y2sh       : 123$
/bin/sh         : 1 2 3$
/bin/b43sh      : 1 2 3$
/bin/b44sh      : 1 2 3$
/bin/bash       : 1 2 3$
/bin/ksh        : 1 2 3$
/bin/ksh93      : 1 2 3$
/bin/lksh       : 1 2 3$
/bin/mksh       : 1 2 3$
/bin/zsh        : 123$
/bin/zsh4       : 123$

Both yash and zsh fail to keep arguments separated. Is that a bug?

  • About zsh/yash and "${l[@]}" in non-list context, that's by design where "${l[@]}" is only special in list contexts. In non-list contexts, there's no separation possible, you need to join the elements somehow. Joining with the first character of $IFS is more consistent than joining with a space character IMO. dash does it as well (dash -c 'IFS=; a=$@; echo "$a"' x a b). POSIX however is intending to change that IIRC though. See this (long) discussion – Stéphane Chazelas Jun 11 '16 at 09:23
  • See also http://austingroupbugs.net/view.php?id=888 and http://thread.gmane.org/gmane.comp.shells.dash/1179 – Stéphane Chazelas Jun 11 '16 at 09:27
  • Replying to myself, no, having a second look, POSIX will leave the behaviour for var=$@ unspecified. – Stéphane Chazelas Jun 11 '16 at 09:33