variable IFS and different result with listing files with loop

Question

I want to get list of files in current directory and its sub-directories (I want to use one-liner script):

IFS=$(echo -en "\n\b");
for FILE in $(find -type f); do echo "$FILE"; done

Usually, it works as expected, but recently, with my list of files:

file_.doc
file_0.doc
file_[2006_02_25].doc
file_[2016_06_16].odt
file_[2016_06_16].pdf
file_[16-6-2006].doc
file_.pdf
file_ 4-4-2006.doc

the output is:

./file_.doc                                                                                                                                                 
./file_0.doc                                                                                                                                                
./file_0.doc
./file_[2016_06_16].odt
./file_[2016_06_16].pdf
./file_0.doc
./file_.pdf
./file_ 4-4-2006.doc

If I change the variable IFS to:

IFS=$(echo -en "\n");

then the output will be (corrected):

./file_.doc
./file_0.doc
./file_[2006_02_25].doc
./file_[2016_06_16].odt
./file_[2016_06_16].pdf
./file_[16-6-2006].doc
./file_.pdf
./file_ 4-4-2006.doc

I have read that '\b' is neccesary, and found a solution that using printf, instead of echo.

My questions are:

1) Could you explain what made those outputs different?

2) A solution using printf above could be an alternative to echo -en "\n\b"?

Kusalananda · Answer 1 · 2018-10-26T15:19:30.187

4

Don't do $( find ... ). It will invoke filename generation (globbing), and some of your filenames will be interpreted as globbing patterns that matches other filenames. For example, the pattern file_[2006_02_25].doc and file_[16-6-2006].doc matches file_0.doc which is why this filename occurs instead of these two patterns.

Your loop will furthermore not start iterating until the find command in the command substitution has generated all its pathnames, which could in the general case use up quite an amount of memory, and is not really elegant.

Instead, just use find (and don't modify IFS):

find . type -f -print

If you want to do other things with these files, then you may do so in an -exec:

find . -type f -exec sh -c 'printf "Found the file %s\n" "$@"' sh {} +

If you just want to process the files in the current directory, you may simply

for name in *; do
    printf 'Found the name %s\n' "$name"
done

Understanding the -exec option of `find`

edited Oct 26 '18 at 15:19

answered Feb 28 '18 at 08:16

Kusalananda

333,661

For recursively get files inside subdirectories (I want to get files' name as variables), I manage to use loop with testing if the entries are directory, and go inside each directories, I could do this, but I want to know if it is "safe"? Thanks! – duqu Feb 28 '18 at 08:36
1

@duqu You could do that (if you always remember to quote variable expansions), but it's awkward. It would be better if you used -exec to execute the needed code on each file, or group of files. I wasn't able to write anything more specific about this, because the question is not about how to do that but about why you get different results with your code. – Kusalananda Feb 28 '18 at 08:50

ilkkachu · Accepted Answer · 2018-02-28T09:48:10.963

The output of a command substitution is subject to word splitting (which you took care of by setting IFS), and filename globbing. The [abc] construct is the "match any of the characters a, b, c" as usual, and [2006_02_25].doc matches 0.doc.

In Bash/ksh/zsh, you can use the double-star to get all files in the directory tree (recursively, not just the current directory). This should find the same files as your example:

shopt -s globstar      # in Bash
# set -o globstar      # in ksh
for file in **/* ; do
    [[ -f $file ]] || continue     # check it's a regular file, like find -type f
    ...
done

Of course, find is powerful, so using it might be easier if you have lots of conditions. If you do, you should disable filename globbing with set -f in addition to fixing IFS:

set -f
IFS=$'\n'
for file in $(find -type f -some -other -conditions) ; do
    ...
done

or use a while read loop with process substitution instead:

while IFS= read -r file ; do 
    ...
done < <(find -type f -some -other -conditions)

_{(The above is similar to find ... | while ..., but it bypasses the issue of the last part of a pipeline being executed in a subshell.)}

Both of those assume the filenames don't contain newlines, as they're used as the separator for the output of find. _{(and because $(..) eats final newlines.)}

In Bash, at least, there's actually a way to make newlines in filenames work too. Setting the read delimiter to the empty string makes it effectively use the NUL byte as delimiter. So:

while IFS= read -d '' -r file ; do 
    ...
done < <(find -type f -some -other -conditions -print0)

Though that's starting to get icky to write, as you do need all those options to read to make it work without mangling the input.

As for the IFS... Setting IFS=$(echo -en "\n"); sets IFS to the empty string (because command substitution eats the trailing newline), resulting in no splitting. The output in this case seems correct since you get all the output of find in one go, not line-by-line. This also masks the issue with filename globbing, since the full, multiline string doesn't match any filenames and is passed as-is.

You can see the difference if you do anything else than just print the loop value. Try adding some separators:

IFS=$(echo -en "\n")       # same as IFS=
for FILE in $(find -type f); do echo "<$FILE>"; done

By far the easiest way to set IFS to a newline in anything but the barest standard shells is IFS=$'\n'.

variable IFS and different result with listing files with loop

2 Answers2