7

In bash, if we activate the extglob shell option, we can do fancy stuff like negating a glob (from man bash):

   If  the  extglob shell option is enabled using the shopt builtin, sev‐
   eral extended pattern matching operators are recognized.  In the  fol‐
   lowing  description,  a pattern-list is a list of one or more patterns
   separated by a |.  Composite patterns may be formed using one or  more
   of the following sub-patterns:
      ?(pattern-list)
             Matches zero or one occurrence of the given patterns
      *(pattern-list)
             Matches zero or more occurrences of the given patterns
      +(pattern-list)
             Matches one or more occurrences of the given patterns
      @(pattern-list)
             Matches one of the given patterns
      !(pattern-list)
             Matches anything except one of the given patterns

For example:

$ ls
a1file  a2file  b1file  b2file

$ shopt -s extglob $ ls !(a*) b1file b2file

Now, if I want to do something to directories only, I can use */:

$ ls -F
a1file  a2file  adir/  b1file  b2file  bdir/

$ ls -d */ adir/ bdir/

However, the glob */ apparently cannot be negated:

$ ls -Fd !(*/)
a1file  a2file  adir/  b1file  b2file  bdir/

What gives? Why doesn't !(*/) exclude directories when */ correctly includes only directories? And is there a way to exclude directories using globs in bash?

The above commands were tested using GNU bash, version 5.1.8(1)-release on an Arch Linux system.

terdon
  • 242,166

1 Answers1

12

Because globs don't cross / boundaries. Except for the special case of **/¹ (originally from zsh, now also found in a few other shells often after setting an option (shopt -s globstar for bash)), a glob operator cannot match something that contains a / as they are applied on a directory listing.

The shell splits a x/y/z glob on /s. For each component, if the component contains glob operators, the shell lists the parent directory and matches the pattern again each entry, and if not, it just looks for that file with lstat()².

You'll see a*b/c won't match on a/b/c. The shell is only matching a*b against the entries in the current directory. Even [a/b]* is treated as [a and b]* separated by a /.

*/ is * and nothing separated with /. It's a special case of */x where the shell first looks for all the files that match * in the listing of the current directory, and then for each, try to see if a file called file/x exists (using lstat() in that case, not listing directories as x does not contain glob operator). With */ it's the same except it checks if a file/ exists (which is only true if file is a directory or symlink to directory).

If you use / inside the ksh-style @(...), !(...)... extended operators (a subset of which is available in bash -O extglob or zsh -o kshglob), the behaviour varies between shells, but will generally not do what you want as patterns in a glob are only ever matched against file names in a directory listing. In bash, !(*/) matches every (non-hidden) filename, likely because here that glob wasn't split on /, and the */ is checked in reverse against each directory entry name, and a directory entry name can't contain a /. That doesn't really explain why !(*[a/b]*) still includes filenames that contain as or bs or why !(*[a")"/b]) excludes filenames containing as but not those containing )s or bs.

If you want files that are not determined to be of type directory after symlink resolution, that's not something you can do with globs alone, you'd need to use zsh and its glob qualifiers which can truly selects files base on attributes other than their name:

print -rC1 -- *(-^/)

Here, zsh matches the glob, and then applies the qualifiers as an extra step after globbing. Here - specifies that the following qualifiers are to be applied after symlink resolution (stat() instead of lstat()), ^ negates the following qualifiers, / selects files of type directory.

With bash 4.4+, you can always outsource the job to something else that prints the result NUL-delimited and use readarray -td '' to get the result, like:

readarray -td '' files < <(zsh -c 'print -rNC1 -- *(N^-.)')
(( ${#files[@]} )) && ls -Fd -- "${files[@]}"

Or with GNU find and sort:

readarray -td '' files < <(
  LC_ALL=C find . -mindepth 1 -maxdepth 1 \
    ! -name '.*' ! -xtype d -printf '%P\0' | sort -z)
(( ${#files[@]} )) && ls -Fd -- "${files[@]}"

(here sorting with sort so as to get the same list as with zsh, though for the special case of passing that list to ls, it's redundant as ls does its own sorting).

While you have a NUL-delimited list, you might as well skip the array step and pass the output to xargs -r0 ls -Fd -- instead, which would avoid having to treat the empty list case specially and works around the arg list too long limitation.


¹ Though see also the ~ extendedglob operator in zsh that can be applied as an extra step after the full glob to filter out paths and match across /s. In a*/b*/c*~*e*, the filename generation algorithm is performed for the a*/b*/c* glob, and then the resulting pathnames are filtered out with with the *e* pattern.

² case insensitive globbing can alter that though like with zsh -o nocaseglob

  • Why zsh's **/ specifically? Ksh and bash's **/ also matches recursively. You might quibble as to whether that means matching multiple components or matching “something that contains a /” but I don't understand what distinction you're making. You might also want to mention zsh's ~ which kind of doesn't-match on multiple components. – Gilles 'SO- stop being evil' Nov 24 '21 at 10:28
  • @Gilles'SO-stopbeingevil', thanks for the feeback. See edit. – Stéphane Chazelas Nov 24 '21 at 10:39
  • If */ means "the shell first looks for all the files that match *" and "then checks if a file/ exists", then why can't it do the opposite and check if a file/ does not exist? The negated !(*/) doesn't need to cross a / boundary any more than the */ does. – terdon Nov 24 '21 at 10:42
  • And is there really no way of excluding directories using globs only in bash? – terdon Nov 24 '21 at 10:43
  • 1
    @terdon I believe the point is that, in !(*/), */ is not a globbing pattern, but a filepath including the * pattern. extglob's expressions expect just a globbing pattern (list) inside the parentheses: ?(*/), *(*/) etc. match nothing too. – fra-san Nov 24 '21 at 10:54
  • The shells don't seem to split on every /, as just giving echo !(*[a/ waits for a continuation line. On the face of it, I would have assumed !(*/) compares filename components on that level against the pattern */, which they won't match. But the way !(*[a")"/b]) works doesn't really fit that. Looks to me !([ax/b]) also matches b and c, but not a. (??) – ilkkachu Nov 24 '21 at 12:00
  • @ilkkachu, yes the thing to take from that is that / there is not supported and unexpected triggering unspecified behaviour. Like I said, you'll find much variation between shells (and likely between shell versions), but in any case, what the OP wants to do cannot work unless the filename generation algorithm is reworked to come up with a different way to find files when / is found inside those extended X(...) operators. – Stéphane Chazelas Nov 24 '21 at 12:04