62

I need to expand a glob pattern (like ../smth*/*, or /etc/cron*/) into a list of files, programmatically. What would be the best way to do it?

vadipp
  • 208
Rogach
  • 6,303

6 Answers6

79

Just let it expand inside an array declaration's right side:

list=(../smth*/)          # grab the list
echo "${#list[@]}"        # print array length
echo "${list[@]}"         # print array elements
for file in "${list[@]}"; do echo "$file"; done  # loop over the array

Note that the shell option nullglob needs to be set.
It is not set by default.
It causes globs with no match to expand to nothing instead of causing an error (in zsh or bash -O failglob) or be passed literally (all other Bourne-like shells).

Set it in bash with

shopt -s nullglob

or in zsh or yash with

set -o nullglob

though in zsh (where the nullglob initially came from), you'd rather use the (N) glob qualifier to avoid having to change a global setting:

list( ../smth*/(N) )

The ksh93 equivalent:

list=( ~(N)../smth*/ )
manatwork
  • 31,277
8

compgen is a Bash built-in that you can pass an escaped(!) pattern to, and it outputs matches, returning true or false based on whether there were any. This is especially useful if you need to pass the glob pattern from a variable/script argument.

glob_pattern='../smth*/*'
while read -r file; do
    # your thing
    echo "read $file"
done < <(compgen -G "$glob_pattern" || true)

adding the || true prevents a false return from compgen causing any problems. This method avoids issues with no matches and does not require changing nullglob options.

If you need the items in an array, just initialise one with files=() before the loop, and files+=("$file") inside the loop. You can then see if there were any matches by simply checking the length of the array with if [[ ${#files[@]} -gt 0 ]]; then.

Walf
  • 1,321
  • I used to think this was a satisfactory solution, but it turned out not to be. For instance, compgen does not work properly for files with "composed extensions"; i.e. if you have a file file.txt.bin, with two extensions, expanding **/*.bin will weirdly not match them. In my case I was trying to expand the glob in the context of a Git script. Luckily, git ls-files "$glob" works just as I'd expect. While not a general solution outside of Git trees, I thought I might as well point it out here. – resolritter Feb 24 '21 at 11:40
  • 2
    @resolritter I just ran compgen -G '/**/*.gz' (on bash v4.2.46) and it found several files with composed extensions. I also have the shell option globstar off. Are you saying compgen -G '**/*.bin' produces no output where echo **/*.bin does? – Walf Feb 25 '21 at 03:26
2

I wanted to use a standard input (pipe) in case a resulting command exceeds a command line length limit. The following command worked for me:

echo "../smth*/*" "/etc/cron*/" | xargs -n1 -I{} bash -O nullglob -c "echo {}" | xargs -n1

or for a list of globs:

cat huge_glob_list.txt | xargs -n1 -I{} bash -O nullglob -c "echo {}" | xargs -n1
1

No need to overcomplicate things:

echo your/stuff*
1

Recently I have the same question. And I find that the solution is very simple: (and it is POSIX compliant.)

  • Set $IFS to empty string, which disable word splitting by whitespace characters.
  • Then just unquote the variable to let it expands the globs.

Example code illustrated in for-loop:

pattern='some * dir/my file *'

unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS='' IFS=''

for f in ${pattern} ; do IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS

printf 'Filenames: %s \n' &quot;${f}&quot;

done

Please note that I do not set nullglob by shopt -s nullglob as shopt is not defined in POSIX. If the glob pattern is not found, the pattern expands to itself. Filenames: some dir/my file * is printed in the above code. It is easy to add an if [ -e "${f}" ]; then ... check if necessary.

The same approach can be used to set the positional parameters also.

pattern='some * dir/my file *'

unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS='' set -- ${pattern} IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS unset old_IFS

printf '[%s]\n' "$@"

Note that we cannot make it into one-liner IFS='' command set -- ${pattern}. This one-liner does not disable word splitting.

It may be used in function parameters, but it is not recommended. The restore of $IFS has to be located at the first statement of the function, which is not symmetric in style and easily be forgotten.

func() {
    IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
    unset old_IFS
    printf '[%s]\n' "$@"
}

pattern='some * dir/my file *'

unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS='' func $pattern

Personally I would prefer passing $pattern into the function, then set -- $pattern inside the function. But it is not always possible if the function carries other positional parameters also.

func() {
    unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS=''
    set -- ${pattern}
    IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
    unset old_IFS
printf '[%s]\n' &quot;$@&quot;

}

pattern='some * dir/my file *'

func $pattern

This approach works for both the pattern and the filepaths:

  • if they contain whitespace characters, indeed all characters, and
  • if they contain glob characters, use escape \* to match a literal *, and
  • when the glob characters locates in the directory path components and/or in the filename components. (If glob in directory path is necessary, it cannot be easily implemented using find.)
midnite
  • 423
  • 1
    [ -e "${f}" ] fails for a file that is a symlink to an inaccessible file. [ -e "$f" ] || [ -L "$f" ] is better, but note that to expand dir/* glob, you only need to be able to read dir while for [ -e dir/file ] you need search access to the directory. – Stéphane Chazelas Jan 09 '24 at 21:07
  • 1
    Beware IFS=${old_IFS} doesn't restore $IFS properly is $IFS was previously unset (an unset IFS doesn't mean the same thing as a IFS set to the empty string) – Stéphane Chazelas Jan 09 '24 at 21:08
  • See Avoiding errors due to unexpanded asterisk for a common technique to work around the misfeature introduced by the Bourne shell (and reverted by several modern shells including zsh and fish) whereby a non-matching glob expands to itself. See also Why is nullglob not default? – Stéphane Chazelas Jan 09 '24 at 21:12
  • Using var='foo[*]bar*.txt' is slightly more portable thanvar='foo\*bar*.txt' to escape * even if both are meant to be POSIX. – Stéphane Chazelas Jan 09 '24 at 21:14
  • @StéphaneChazelas - Thank you for heads up. I will be careful with symlinks, and when dir with no r or with no x. In most cases. dir with only x then read the file, is more applicable than dir with only r and we can only list the files but nothing else can be done. So I think a check of [ -e dir/file ] serves most scenarios. – midnite Jan 10 '24 at 08:44
  • I know unset IFS means IFS=<space><tab><newline> while IFS='' means split on nothing. But the code above did not unset IFS previously. – midnite Jan 10 '24 at 08:45
  • May I know the reason why var='foo[*]bar*.txt' is more portable? Is it because matched literal [*] results into *, on the other hand, matched literal \* results into \*? Just like replacing [*] with \* does not work in the code here - https://unix.stackexchange.com/a/56087/150246 . – midnite Jan 10 '24 at 09:13
  • 1
    Depending on the shell implementation, var='foo\*bar*.txt' sh -c 'IFS=; echo $var' matches either on foo*barWHATEVER.txt or foo\WHATEVERbarWHATEVER.txt (like in Ubuntu 20.04's mksh), while var='foo[*]bar*.txt' matches the former consistently across shells. – Stéphane Chazelas Jan 10 '24 at 09:39
  • I meant that that code doesn't restore $IFS properly if called in a context where $IFS was unset. See also What's a safe and portable way to split a string in shell programming? – Stéphane Chazelas Jan 10 '24 at 09:44
  • @StéphaneChazelas - Regarding [*] is more preferable than \*, I am just facing this problem. You spotted it before I notice. I got this question in Bash: https://unix.stackexchange.com/questions/767124/backslash-in-unquoted-variable-for-glob-expansion . And I notice in your example, if there is no second asterisk, the first asterisk will become literal. This is weird. var='foo\*bar.txt' sh -c 'IFS=; ls $var' matches only the file foo\*bar.txt literally. – midnite Jan 17 '24 at 13:26
  • @StéphaneChazelas - The issue of if $IFS was previously unset is fixed. – midnite Jan 17 '24 at 13:38
0

Nowadays most linux distributions have python included, so you can just run the following command in shell

python -c 'from glob import glob; print(glob("*"))'

You are free modify the python script to meet your requirement, for example, dump to json format string.

python -c 'from glob import glob; from json import dumps; print(dumps(glob("*")))'
link89
  • 101