I need to expand a glob pattern (like ../smth*/*, or /etc/cron*/) into a list of files, programmatically. What would be the best way to do it?
6 Answers
Just let it expand inside an array declaration's right side:
list=(../smth*/) # grab the list
echo "${#list[@]}" # print array length
echo "${list[@]}" # print array elements
for file in "${list[@]}"; do echo "$file"; done # loop over the array
Note that the shell option nullglob needs to be set.
It is not set by default.
It causes globs with no match to expand to nothing instead of causing an error (in zsh or bash -O failglob) or be passed literally (all other Bourne-like shells).
Set it in bash with
shopt -s nullglob
or in zsh or yash with
set -o nullglob
though in zsh (where the nullglob initially came from), you'd rather use the (N) glob qualifier to avoid having to change a global setting:
list( ../smth*/(N) )
The ksh93 equivalent:
list=( ~(N)../smth*/ )
- 544,893
- 31,277
-
-
It is just a regular array. You can do whatever you can with any array. Added some examples. – manatwork Mar 12 '12 at 13:44
-
2There is a problem. If pattern matches no files, it prints itself - which is not very good. – Rogach Mar 12 '12 at 14:21
-
8
-
-
1
-
1Having the pattern print itself is probably what is wanted in some cases. Using the option
failglobmight be more appropriate in others. See this answer for an in-depth discussion. – SpinUp __ A Davis May 06 '22 at 20:41
compgen is a Bash built-in that you can pass an escaped(!) pattern to, and it outputs matches, returning true or false based on whether there were any. This is especially useful if you need to pass the glob pattern from a variable/script argument.
glob_pattern='../smth*/*'
while read -r file; do
# your thing
echo "read $file"
done < <(compgen -G "$glob_pattern" || true)
adding the || true prevents a false return from compgen causing any problems. This method avoids issues with no matches and does not require changing nullglob options.
If you need the items in an array, just initialise one with files=() before the loop, and files+=("$file") inside the loop. You can then see if there were any matches by simply checking the length of the array with if [[ ${#files[@]} -gt 0 ]]; then.
- 1,321
-
I used to think this was a satisfactory solution, but it turned out not to be. For instance,
compgendoes not work properly for files with "composed extensions"; i.e. if you have a filefile.txt.bin, with two extensions, expanding**/*.binwill weirdly not match them. In my case I was trying to expand the glob in the context of a Git script. Luckily,git ls-files "$glob"works just as I'd expect. While not a general solution outside of Git trees, I thought I might as well point it out here. – resolritter Feb 24 '21 at 11:40 -
2@resolritter I just ran
compgen -G '/**/*.gz'(on bash v4.2.46) and it found several files with composed extensions. I also have the shell optionglobstaroff. Are you sayingcompgen -G '**/*.bin'produces no output whereecho **/*.bindoes? – Walf Feb 25 '21 at 03:26
I wanted to use a standard input (pipe) in case a resulting command exceeds a command line length limit. The following command worked for me:
echo "../smth*/*" "/etc/cron*/" | xargs -n1 -I{} bash -O nullglob -c "echo {}" | xargs -n1
or for a list of globs:
cat huge_glob_list.txt | xargs -n1 -I{} bash -O nullglob -c "echo {}" | xargs -n1
- 21
No need to overcomplicate things:
echo your/stuff*
-
3this doesn't work. For example:
TEST=$(echo your/stuff*) && eval \"$TEST\"will output:your/stuff*: No such file or directory– Sebastian Jun 27 '19 at 17:41 -
2
-
7No, it's not a nullglob issue. Using scape characters is evaluating
TESTvariable as a string including*and not being expanded. – Sebastian Jul 05 '19 at 20:14
Recently I have the same question. And I find that the solution is very simple: (and it is POSIX compliant.)
- Set
$IFSto empty string, which disable word splitting by whitespace characters. - Then just unquote the variable to let it expands the globs.
Example code illustrated in for-loop:
pattern='some * dir/my file *'
unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS=''
IFS=''
for f in ${pattern} ; do
IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
printf 'Filenames: %s \n' "${f}"
done
Please note that I do not set nullglob by shopt -s nullglob as shopt is not defined in POSIX. If the glob pattern is not found, the pattern expands to itself. Filenames: some dir/my file * is printed in the above code. It is easy to add an if [ -e "${f}" ]; then ... check if necessary.
The same approach can be used to set the positional parameters also.
pattern='some * dir/my file *'
unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS=''
set -- ${pattern}
IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
unset old_IFS
printf '[%s]\n' "$@"
Note that we cannot make it into one-liner IFS='' command set -- ${pattern}. This one-liner does not disable word splitting.
It may be used in function parameters, but it is not recommended. The restore of $IFS has to be located at the first statement of the function, which is not symmetric in style and easily be forgotten.
func() {
IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
unset old_IFS
printf '[%s]\n' "$@"
}
pattern='some * dir/my file *'
unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS=''
func $pattern
Personally I would prefer passing $pattern into the function, then set -- $pattern inside the function. But it is not always possible if the function carries other positional parameters also.
func() {
unset old_IFS ; [ -n "${IFS+x}" ] && old_IFS=${IFS} ; IFS=''
set -- ${pattern}
IFS=${old_IFS} ; [ -z "${old_IFS+x}" ] && unset IFS
unset old_IFS
printf '[%s]\n' "$@"
}
pattern='some * dir/my file *'
func $pattern
This approach works for both the pattern and the filepaths:
- if they contain whitespace characters, indeed all characters, and
- if they contain glob characters, use escape
\*to match a literal*, and - when the glob characters locates in the directory path components and/or in the filename components. (If glob in directory path is necessary, it cannot be easily implemented using
find.)
- 423
-
1
[ -e "${f}" ]fails for a file that is a symlink to an inaccessible file.[ -e "$f" ] || [ -L "$f" ]is better, but note that to expanddir/*glob, you only need to be able to readdirwhile for[ -e dir/file ]you need search access to the directory. – Stéphane Chazelas Jan 09 '24 at 21:07 -
1Beware
IFS=${old_IFS}doesn't restore$IFSproperly is$IFSwas previously unset (an unsetIFSdoesn't mean the same thing as aIFSset to the empty string) – Stéphane Chazelas Jan 09 '24 at 21:08 -
See Avoiding errors due to unexpanded asterisk for a common technique to work around the misfeature introduced by the Bourne shell (and reverted by several modern shells including zsh and fish) whereby a non-matching glob expands to itself. See also Why is nullglob not default? – Stéphane Chazelas Jan 09 '24 at 21:12
-
Using
var='foo[*]bar*.txt'is slightly more portable thanvar='foo\*bar*.txt'to escape*even if both are meant to be POSIX. – Stéphane Chazelas Jan 09 '24 at 21:14 -
@StéphaneChazelas - Thank you for heads up. I will be careful with symlinks, and when dir with no
ror with nox. In most cases. dir with onlyxthen read the file, is more applicable than dir with onlyrand we can only list the files but nothing else can be done. So I think a check of[ -e dir/file ]serves most scenarios. – midnite Jan 10 '24 at 08:44 -
I know
unset IFSmeansIFS=<space><tab><newline>whileIFS=''means split on nothing. But the code above did not unset IFS previously. – midnite Jan 10 '24 at 08:45 -
May I know the reason why
var='foo[*]bar*.txt'is more portable? Is it because matched literal[*]results into*, on the other hand, matched literal\*results into\*? Just like replacing[*]with\*does not work in the code here - https://unix.stackexchange.com/a/56087/150246 . – midnite Jan 10 '24 at 09:13 -
1Depending on the shell implementation,
var='foo\*bar*.txt' sh -c 'IFS=; echo $var'matches either onfoo*barWHATEVER.txtorfoo\WHATEVERbarWHATEVER.txt(like in Ubuntu 20.04's mksh), whilevar='foo[*]bar*.txt'matches the former consistently across shells. – Stéphane Chazelas Jan 10 '24 at 09:39 -
I meant that that code doesn't restore
$IFSproperly if called in a context where$IFSwas unset. See also What's a safe and portable way to split a string in shell programming? – Stéphane Chazelas Jan 10 '24 at 09:44 -
@StéphaneChazelas - Regarding
[*]is more preferable than\*, I am just facing this problem. You spotted it before I notice. I got this question in Bash: https://unix.stackexchange.com/questions/767124/backslash-in-unquoted-variable-for-glob-expansion . And I notice in your example, if there is no second asterisk, the first asterisk will become literal. This is weird.var='foo\*bar.txt' sh -c 'IFS=; ls $var'matches only the filefoo\*bar.txtliterally. – midnite Jan 17 '24 at 13:26 -
@StéphaneChazelas - The issue of if
$IFSwas previously unset is fixed. – midnite Jan 17 '24 at 13:38
Nowadays most linux distributions have python included, so you can just run the following command in shell
python -c 'from glob import glob; print(glob("*"))'
You are free modify the python script to meet your requirement, for example, dump to json format string.
python -c 'from glob import glob; from json import dumps; print(dumps(glob("*")))'
- 101
*. – Kevin Mar 12 '12 at 14:32