17

I have seen Bash scripting guides suggesting the use of array for working with filenames containing whitespace. DashAsBinSh however suggests that arrays are not portable so I am looking for a POSIX compliant way of working with lists of filenames that may contain whitespace.

I am looking to modify the below example script so that it would echo

foo/target/a.jar
foo/target/b.jar
bar/target/lol whitespace.jar

Here is the script

#!/usr/bin/env sh

INPUT="foo/target/a.jar
foo/target/b.jar
bar/target/b.jar
bar/target/lol whitespace.jar"
# this would be produced by a 'ls' command
# We can execute the ls within the script, if it helps

dostuffwith() { echo $1; };

F_LOCATIONS=$INPUT
ALL_FILES=$(for f in $F_LOCATIONS; do echo `basename $f`; done)
ALL_FILES=$(echo "$ALL_FILES" | sort | uniq)

for f in $ALL_FILES
do
    fpath=$(echo "$F_LOCATIONS" | grep -m1 $f)
    dostuffwith $fpath
done
Eero Aaltonen
  • 621
  • 1
  • 5
  • 13

2 Answers2

12

POSIX shells have one array: the positional parameters ($1, $2, etc., collectively refered to as "$@").

set -- 'foo/target/a.jar' 'foo/target/b.jar' 'bar/target/b.jar' 'bar/target/lol whitespace.jar'
set -- "$@" '/another/one at the end.jar'
…
for jar do
  dostuffwith "$jar"
done

This is inconvenient because there's only one, and it destroys any other use of the positional parameters. Positional parameters are local to a function, which is sometimes a blessing and sometimes a curse.

If your file names are guaranteed not to contain newlines, you can use newlines as the separator. When you expand the variable, first turn off globbing with set -f and set the list of field splitting characters IFS to contain only a newline.

INPUT="foo/target/a.jar
foo/target/b.jar
bar/target/b.jar
bar/target/lol whitespace.jar"
…
set -f; IFS='
'                           # turn off variable value expansion except for splitting at newlines
for jar in $INPUT; do
  set +f; unset IFS
  dostuffwith "$jar"        # restore globbing and field splitting at all whitespace
done
set +f; unset IFS           # do it again in case $INPUT was empty

With items in your list separated by newlines, you can use many text processing commands usefully, in particular sort.

Remember to always put double quotes around variable substitutions, except when you explicitly want field splitting to happen (as well as globbing, unless you've turned that off).

  • Good answer and explanation. I'm going to mark this as accepted because this makes the original sort | uniq step work as intended. – Eero Aaltonen Dec 09 '13 at 09:52
5

Since your $INPUT variable uses newlines as separators, I'm going to assume that your files will not have newlines in the names. As such, yes, there is a simple way of iterating over the files and preserving whitespace.

The idea is to use the read shell builtin. Normally read will split on any whitespace, and so spaces will break it. But you can set IFS=$'\n' and it will instead split on newlines only. So you can iterate over each line in your list.

Here's the smallest solution I could come up with:

INPUT="foo/target/a.jar
foo/target/b.jar
bar/target/b.jar
bar/target/lol whitespace.jar"

dostuffwith() {
    echo "$1"
}

echo "$INPUT" | awk -F/ '{if (!seen[$NF]++) print }' | \
while IFS=$'\n' read file; do
  dostuffwith "$file"
done

Basically it sends "$INPUT" to awk which deduplicates based on the file name (it splits on / and then prints the line if the last item hasn't been seen before). Then once awk has generated the list of file paths, we use while read to iterate through the list.

phemmer
  • 71,831
  • $ checkbashisms bar.sh possible bashism in bar.sh line 14 (<<< here string) – Eero Aaltonen Nov 28 '13 at 12:12
  • 1
    @EeroAaltonen Changed it to not use the herestring. Note though that with this change, the while loop, and thus dostuffwith is executed in a subshell. So any variables or changes made to the running shell will be lost when the loop completes. The only alternative is to use a full heredoc, which isn't that unpleasant, but I thought this would be preferable. – phemmer Nov 28 '13 at 22:28
  • I'm awarding points based more on readability than smallness. This certainly works and already +1 for that. – Eero Aaltonen Nov 29 '13 at 09:21
  • IFS="\n" splits on backslash and n characters. But in read file, there's no splitting. IFS="\n" is still useful in that it removes the blank characters from $IFS which otherwise would have been stripped at the beginning and end of the input. To read a line, the canonical syntax is IFS= read -r line, though IFS=anything read -r line (provided anything doesn't contain blanks) will work as well. – Stéphane Chazelas Nov 18 '14 at 08:31
  • oops. Not sure how I managed that one. Fixed. – phemmer Nov 18 '14 at 13:21