3

It is known that a path could contain newlines in any of its components.

Should we conclude then that the environment variable $PATH could contain newlines ?

If so, how to split the $PATH into its elements, similar to (Bourne like):

    IFS=':' ; set -f
    for var in $PATH
    do
        echo "<$var>"
    done

But if it could be done without changing IFS, even better.

  • Note that in the Bourne shell (contrary to POSIX shells), /bin::/usr/bin would be split into /bin and /usr/bin instead of /bin, "" and /usr/bin. – Stéphane Chazelas Dec 31 '18 at 11:12

2 Answers2

4

In POSIX shells, $IFS is a field delimiter, not separator, so a $PATH value like /bin:/usr/bin: would be split into /bin and /usr/bin instead of /bin, /usr/bin and the empty string (meaning the current directory). You need:

IFS=:; set -o noglob
for var in $PATH""; do
  printf '<%s>\n' "$var"
done

To avoid modifying global settings, you can use a shell with explicit splitting operators like zsh:

for var in "${(s/:/@)PATH}"; do
  printf '<%s>\n' "$var"
done

Though in that case, zsh already has the $path array tied to $PATH like in csh/tcsh, so:

for var in "$path[@]"; do
  printf '<%s>\n' "$var"
done

In any case, yes, in theory $PATH like any variable could contain newline characters, the newline character is not special in any way when it comes to file path resolution. I don't expect anyone sensible would put a directory with newline (or wildcards) in their $PATH or name a command with newline in its name. It's also hard to imagine a scenario where someone could exploit a script that makes the assumption that $PATH won't contain newline characters.

  • I'm saying that set -o noglob is more portable among the shells that can run that code if we want to consider zsh -o shwordsplit -o globsubst in that list. set -f is more portable among ancient Bourne-like shells, but those shells that don't support set -o noglob cannot run that code correctly anyway. When zsh was written in 1990, csh/tcsh were by far the most popular shells at the time. All of ksh/bash/zsh borrowed features from csh (...) – Stéphane Chazelas Jan 02 '19 at 11:11
  • (...) csh had the -f option (for fast start) long before the Bourne shell added its -f to disable glob. So if you want to blame something for breaking compatibility, blame the Bourne (SysV) shell. There would be not reason why one would want to disable glob if it weren't for that bug of the Bourne shell whereby globbing is performed upon expansions. zsh fixed that bug, so set -o noglob is not needed there unless in sh emulation (where set -f works to disable it) or the globsubst option is enabled. – Stéphane Chazelas Jan 02 '19 at 11:13
2

Yes, PATH can contain newlines (even on ancient Unix system).

As to splitting any string in shell, the only way you can do it portably is with IFS. You can use IFS=:; set -f; set -- $PATH or pass it to a function instead of looping with for, though.

With bash you can also "read" a string into an array:

xtra=$'some\nother\nplace\n\n'; PATH="$PATH:$xtra"
mapfile -td: path < <(printf %s "$PATH")
printf '<%s>\n' "${path[@]}"

But using arrays is usually not a good idea, because they can't be stored transparently in environment variables or passed as a single argument to external commands.

Notice that IFS will terminate fields, not separate them (kind of like \n at the end of the file won't be treated like an empty line by programs reading the file line-by-line); if that's not what's expected, and you really want to create an extra empty field at the end when splitting a string that ends in a character from IFS, you should join an empty string after the variable that is subject to word splitting:

(P=/bin:; IFS=:; printf '<%s>\n' $P"")
</bin>
<>

The word splitting algorithm will also ignore white space characters at the beginning of the string, if those whitespace characters are part of IFS. If you want an extra field for the leading whitespace, you should also join an empty string before the variable:

(P='   foo : bar  '; IFS=': '; set -f; set -- $P; printf '<%s>\n' "$@")
<foo>
<bar>

(P='   foo : bar  '; IFS=': '; set -f; set -- ""$P""; printf '<%s>\n' "$@")
<>
<foo>
<bar>
<>
  • Using arrays is often an excellent idea, as a number of answers here on unix.SE show. It's almost impossible to handle lists of strings with arbitrary data without using an array. You only need lists of paths with whitespace, or a list of command arguments to get the issue. Of course you can use the positional parameters instead of an array, but those aren't any better regarding the points you mention: they can't be sanely pushed through the environment, nor passed as a single argument to external commands. – ilkkachu Dec 31 '18 at 10:32
  • No, you need the set -f to take effect before the $PATH expansion. So it should be set -o noglob; set -- $PATH"" – Stéphane Chazelas Dec 31 '18 at 10:41
  • @ilkkachu fwiw, quoting the here-string variable is not needed: x='a b'; mapfile -td: <<< $x y; printf '<%s>\n' "$y"; but the added trailing newline is a problem, really. –  Dec 31 '18 at 12:14
  • @ilkkachu and that's documented in the bash manual, under "Here Strings": "Pathname expansion and word splitting are not performed" –  Dec 31 '18 at 12:25
  • @StéphaneChazelas thanks, I've changed it to use a process substitution instead. –  Dec 31 '18 at 12:36
  • @ilkkachu As long as you don't get out of bash, zsh, etc. arrays are, of course, terrific (there's a reason why all sensible languages have them ;-)). But you couldn't blame some newbie for trying to store them as such in the environment, or pass them via cmd "$array" (and expecting "$1" to be an array in cmd), and conclude (very reasonably) that the whole thing sucks ;-). This problem (just like being able to pass a var from the child to the parent) was fixed in plan9; but it can't be done in unix, and any shell workaround is bound to be limited and deceptive. –  Dec 31 '18 at 12:55
  • @pizdelect, right, the manual does say that word splitting and pathname expansion aren't performed on here-strings. I'm a bit wary about that, since < does apply pathname expansion. – ilkkachu Dec 31 '18 at 13:30
  • Like the OP's, it still doesn't do the right thing if $PATH is empty or ends in : characters. – Stéphane Chazelas Dec 31 '18 at 14:32
  • @ilkkachu, you needed the quotes in earlier versions of bash where in <<< $var, the content of $var would be split on $IFS and then joined back with SPC characters. That was fixed in 4.4. The <<< operator itself comes from zsh and the Unix variant of rc which never had that problem (the rc variant also doesn't add the trailing newline). – Stéphane Chazelas Dec 31 '18 at 14:35
  • fish and the Unix variant of rc and its derivatives do support exporting arrays using some form of encoding. yash's exported arrays end up being joined with :. – Stéphane Chazelas Dec 31 '18 at 14:36
  • @StéphaneChazelas, I sort of assumed that would happen, yes. – ilkkachu Dec 31 '18 at 15:47
  • Both first two solutions do not work on sh like shells (like dash). The last solution splits also on space breaking a a b into <a><b>. –  Jan 09 '19 at 11:47
  • The first inline example (IFS=:; set -f; set -- $PATH) will work in any POSIX shell (stripping a trailing empty field, as explained in the 2nd part). The 1st code block is bash-specific, as clearly told ("In bash you can also read a string into an array..."). The last 2 code blocks work as shown and explained in the text in /bin/dash or any standard shell. –  Jan 09 '19 at 13:35