12

Given these file names:

$ ls -1
file
file name
otherfile

bash itself does perfectly fine with embedded whitespace:

$ for file in *; do echo "$file"; done
file
file name
otherfile
$ select file in *; do echo "$file"; done
1) file
2) file name
3) otherfile
#?

However, sometimes I might not want to work with every file, or even strictly in $PWD, which is where find comes in. Which also handles whitespace nominally:

$ find -type f -name file\*
./file
./file name
./directory/file
./directory/file name

I'm trying to concoct a whispace-safe version of this scriptlet which will take the output of find and present it into select:

$ select file in $(find -type f -name file); do echo $file; break; done
1) ./file
2) ./directory/file

However, this explodes with whitespace in the filenames:

$ select file in $(find -type f -name file\*); do echo $file; break; done
1) ./file        3) name          5) ./directory/file
2) ./file        4) ./directory/file  6) name

Ordinarily, I would get around this by messing around with IFS. However:

$ IFS=$'\n' select file in $(find -type f -name file\*); do echo $file; break; done
-bash: syntax error near unexpected token `do'
$ IFS='\n' select file in $(find -type f -name file\*); do echo $file; break; done
-bash: syntax error near unexpected token `do'

What is the solution to this?

DopeGhoti
  • 76,081

4 Answers4

15

If you only need to handle spaces and tabs (not embedded newlines) then you can use mapfile (or its synonym, readarray) to read into an array e.g. given

$ ls -1
file
other file
somefile

then

$ IFS= mapfile -t files < <(find . -type f)
$ select f in "${files[@]}"; do ls "$f"; break; done
1) ./file
2) ./somefile
3) ./other file
#? 3
./other file

If you do need to handle newlines, and your bash version provides a null-delimited mapfile1, then you can modify that to IFS= mapfile -t -d '' files < <(find . -type f -print0) . Otherwise, assemble an equivalent array from null-delimited find output using a read loop:

$ touch $'filename\nwith\nnewlines'
$ 
$ files=()
$ while IFS= read -r -d '' f; do files+=("$f"); done < <(find . -type f -print0)
$ 
$ select f in "${files[@]}"; do ls "$f"; break; done
1) ./file
2) ./somefile
3) ./other file
4) ./filename
with
newlines
#? 4
./filename?with?newlines

1 the -d option was added to mapfile in bash version 4.4 iirc

steeldriver
  • 81,074
  • 2
    +1 for another verb I've not used before – Chris Davies Jul 13 '17 at 17:33
  • Indeed, mapfile is a new one to me also. Kudos. – DopeGhoti Jul 13 '17 at 17:48
  • The while IFS= read version works back in bash v3 (which is important for those of us using macOS). – Gordon Davisson Jul 13 '17 at 19:39
  • 4
    +1 for the find -print0 variant; grumble for putting it after a known-incorrect version, and describing it only for use if one knows that they need to handle newlines. If one only handles the unexpected in places where it's expected, one will never be handling the unexpected at all. – Charles Duffy Jul 13 '17 at 20:35
8

This answer has solutions for any type of files. With newlines or spaces.
There are solutions for recent bash as well as ancient bash and even old posix shells.

The tree listed down below in this answer[1] is used for the tests.

select

It is easy to get select to work either with an array:

$ dir='deep/inside/a/dir'
$ arr=( "$dir"/* )
$ select var in "${arr[@]}"; do echo "$var"; break; done

Or with the positional parameters:

$ set -- "$dir"/*
$ select var; do echo "$var"; break; done

So, the only real problem is to get the "list of files" (correctly delimited) inside an array or inside the Positional Parameters. Keep reading.

bash

I don't see the problem you report with bash. Bash is able to search inside a given directory:

$ dir='deep/inside/a/dir'
$ printf '<%s>\n' "$dir"/*
<deep/inside/a/dir/directory>
<deep/inside/a/dir/file>
<deep/inside/a/dir/file name>
<deep/inside/a/dir/file with a
newline>
<deep/inside/a/dir/zz last file>

Or, if you like a loop:

$ set -- "$dir"/*
$ for f; do printf '<%s>\n' "$f"; done
<deep/inside/a/dir/directory>
<deep/inside/a/dir/file>
<deep/inside/a/dir/file name>
<deep/inside/a/dir/file with a
newline>
<deep/inside/a/dir/zz last file>

Note that the syntax above will work correctly with any (reasonable) shell ( not csh at least).

The only limit that the syntax above has is to descend into other directories.
But bash could do that:

$ shopt -s globstar
$ set -- "$dir"/**/*
$ for f; do printf '<%s>\n' "$f"; done
<deep/inside/a/dir/directory>
<deep/inside/a/dir/directory/file>
<deep/inside/a/dir/directory/file name>
<deep/inside/a/dir/directory/file with a
newline>
<deep/inside/a/dir/directory/zz last file>
<deep/inside/a/dir/file>
<deep/inside/a/dir/file name>
<deep/inside/a/dir/file with a
newline>
<deep/inside/a/dir/zz last file>

To select only some files (like the ones that end in file) just replace the *:

$ set -- "$dir"/**/*file
$ printf '<%s>\n' "$@"
<deep/inside/a/dir/directory/file>
<deep/inside/a/dir/directory/zz last file>
<deep/inside/a/dir/file>
<deep/inside/a/dir/zz last file>

robust

When you place a "space-safe" in the title, I am going to assume that what you meant was "robust".

The simplest way to be robust about spaces (or newlines) is to reject the processing of input that has spaces (or newlines). A very simple way to do this in the shell is to exit with an error if any file name expands with an space. There are several ways to do this, but the most compact (and posix) (but limited to one directory contents, including suddirectories names and avoiding dot-files) is:

$ set -- "$dir"/file*                            # read the directory
$ a="$(printf '%s' "$@" x)"                      # make it a long string
$ [ "$a" = "${a%% *}" ] || echo "exit on space"  # if $a has an space.
$ nl='
'                    # define a new line in the usual posix way.  

$ [ "$a" = "${a%%"$nl"*}" ] || echo "exit on newline"  # if $a has a newline.

If the solution used is robust in any of those items, remove the test.

In bash, sub- directories could be tested at once with the ** explained above.

There are a couple of ways to include dot files, the Posix solution is:

set -- "$dir"/* "$dir"/.[!.]* "$dir"/..?*

find

If find must be used for some reason, replace the delimiter with a NUL (0x00).

bash 4.4+

$ readarray -t -d '' arr < <(find "$dir" -type f -name file\* -print0)
$ printf '<%s>\n' "${arr[@]}"
<deep/inside/a/dir/file name>
<deep/inside/a/dir/file with a
newline>
<deep/inside/a/dir/directory/file name>
<deep/inside/a/dir/directory/file with a
newline>
<deep/inside/a/dir/directory/file>
<deep/inside/a/dir/file>

bash 2.05+

i=1  # lets start on 1 so it works also in zsh.
while IFS='' read -d '' val; do 
    arr[i++]="$val";
done < <(find "$dir" -type f -name \*file -print0)
printf '<%s>\n' "${arr[@]}"

POSIXLY

To make a valid POSIX solution where find does not have a NUL delimiter and there is no -d (nor -a) for read we need an entirelly diferent aproach.

We need to use a complex -exec from find with a call to a shell:

find "$dir" -type f -exec sh -c '
    for f do
        echo "<$f>"
    done
    ' sh {} +

Or, if what is needed is a select (select is part of bash, not sh):

$ find "$dir" -type f -exec bash -c '
      select f; do echo "<$f>"; break; done ' bash {} +

1) deep/inside/a/dir/file name
2) deep/inside/a/dir/zz last file
3) deep/inside/a/dir/file with a
newline
4) deep/inside/a/dir/directory/file name
5) deep/inside/a/dir/directory/zz last file
6) deep/inside/a/dir/directory/file with a
newline
7) deep/inside/a/dir/directory/file
8) deep/inside/a/dir/file
#? 3
<deep/inside/a/dir/file with a
newline>

[1] This tree (the \012 are newlines):

$ tree
.
└── deep
    └── inside
        └── a
            └── dir
                ├── directory
                │   ├── file
                │   ├── file name
                │   └── file with a \012newline
                ├── file
                ├── file name
                ├── otherfile
                ├── with a\012newline
                └── zz last file

Could be built with this two commands:

$ mkdir -p deep/inside/a/dir/directory/
$ touch deep/inside/a/dir/{,directory/}{file{,\ {name,with\ a$'\n'newline}},zz\ last\ file}
6

You can't set a variable in front of a looping construct, but you can set it in front of the condition. Here's the segment from the man page:

The environment for any simple command or function may be augmented temporarily by prefixing it with parameter assignments, as described above in PARAMETERS.

(A loop isn't a simple command.)

Here's a commonly used construct demonstrating the failure and success scenarios:

IFS=$'\n' while read -r x; do ...; done </tmp/file     # Failure
while IFS=$'\n' read -r x; do ...; done </tmp/file     # Success

Unfortunately I cannot see a way to embed a changed IFS into the select construct while having it affect the processing of an associated $(...). However, there's nothing to prevent IFS being set outside the loop:

IFS=$'\n'; while read -r x; do ...; done </tmp/file    # Also success

and it's this construct that I can see works with select:

IFS=$'\n'; select file in $(find -type f -name 'file*'); do echo "$file"; break; done

When writing defensive code I'd recommend that the clause either be run in a subshell, or IFS and SHELLOPTS saved and restored around the block:

OIFS="$IFS" IFS=$'\n'                     # Split on newline only
OSHELLOPTS="$SHELLOPTS"; set -o noglob    # Wildcards must not expand twice

select file in $(find -type f -name 'file*'); do echo $file; break; done

IFS="$OIFS"
[[ "$OSHELLOPTS" !~ noglob ]] && set +o noglob
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • The semicolon after setting IFS was the missing sauce! (also, I do work similarly, but since I generally use the default IFS, I just unset it when I'm done using a tweaked version of it, as bash will use the default value if it's unset. – DopeGhoti Jul 13 '17 at 16:20
  • and remember set -f – ilkkachu Jul 13 '17 at 16:45
  • @roaima, if the $(find...) outputs filenames with glob characters, the shell will expand the globs which may lead to duplicated names in the list, perhaps even listing files that should not have been matched. (of course the -name filter will limit that, but you might still get, say directories even though -type f was specified.) – ilkkachu Jul 13 '17 at 19:17
  • @ilkkachu ah, yes. I think I've added that correctly now – Chris Davies Jul 13 '17 at 20:26
  • 5
    Assuming that IFS=$'\n' is safe is unfounded. Filenames are perfectly able to contain newline literals. – Charles Duffy Jul 13 '17 at 20:34
  • @CharlesDuffy yes you're right to pick that up. My answer was following on from a different question that gave rise to this, where files containing newlines had previously been explicitly excluded from the dataset. But I forgot to mention that here. – Chris Davies Jul 13 '17 at 20:39
  • 4
    I'm frankly hesitant to accept such assertions about one's possible dataset at face value, even when present. The worst data loss event I've been present for was a case where a maintenance script responsible for cleanup of old backups tried to remove a file which had been created by a Python script using a C module with a bad pointer dereference which dumped random garbage -- including a whitespace-separated wildcard -- into the name. – Charles Duffy Jul 13 '17 at 20:43
  • 2
    The folks building the shell script doing cleanup of those files didn't bother to quote because names "couldn't possibly" fail to match [0-9a-f]{24}. TB of backups of data used to support customer billing were lost. – Charles Duffy Jul 13 '17 at 20:44
  • 4
    Agree with @CharlesDuffy completely. Not handling edge cases is only fine when you're working interactively and can see what you're doing. select by its very design is for scripted solutions, so it should always be designed to handle edge cases. – Wildcard Jul 13 '17 at 21:27
  • 1
    @Wildcard, select? scripted? It displays a list and expects input...? Worse, the format of the list seems to depend on the shell and the number of choices, Bash and zsh sometimes use a multi-column format on the display, Bash and ksh93 sometimes don't... – ilkkachu Jul 14 '17 at 08:01
  • 2
    @ilkkachu, of course -- you wouldn't ever call select from a shell where you're typing in the commands to run, but only at a script, where you're answering a prompt provided by that script, and where that script is executing predefined logic (built without knowledge of the filenames being operated on) based on that input. – Charles Duffy Jul 14 '17 at 15:33
  • 1
    @CharlesDuffy, yes, of course. But with select it's not fully automated, as the user gets to see the filename that gets processed. Though stuff like newlines and carriage returns would mess the output, in which case one hopes the user is awake enough to choose ^C instead – ilkkachu Jul 14 '17 at 15:49
  • 1
    @ilkkachu, if one includes terminal control codes in the output (and the specific implementation of select in use doesn't do its own escaping -- being explicitly called out as an allowed reserved word but not at all defined by POSIX, what select does or doesn't do is hard to predict), then the user can be prevented from seeing what's really there, so their awakeness is moot. (Unless using something like printf '%q' to escape them into printability). – Charles Duffy Jul 14 '17 at 16:02
4

I may be out of my jurisdiction here but maybe you can start with something like this, at least it doesn't have any trouble with the whitespace:

find -maxdepth 1 -type f -printf '%f\000' | {
    while read -d $'\000'; do
            echo "$REPLY"
            echo
    done
}

To avoid any potential false assumptions, as noted in the comments, be aware that the above code is equivalent to:

   find -maxdepth 1 -type f -printf '%f\0' | {
        while read -d ''; do
                echo "$REPLY"
                echo
        done
    }
GAD3R
  • 66,769
flerb
  • 963
  • read -d is a clever solution; thanks for this. – DopeGhoti Jul 13 '17 at 16:45
  • 2
    read -d $'\000' is exactly identical to read -d '', but for misleading folks about bash's capabilities (implying, incorrectly, that it's able to represent literal NULs within strings). Run s1=$'foo\000bar'; s2='foo', and then try to find a way to distinguish between the two values. (A future version may normalize with command substitution behavior by making the stored value equivalent to foobar, but that's not the case today). – Charles Duffy Jul 13 '17 at 20:37