1

Bash has two types of pattern matching, Glob and Regex. The general rule of usage seems to be that 1) the simpler glob is done to search filenames 2) regex is used for searching text.

Glob uses the metacharacters at the front, regex uses the metacharacters at the end of the pattern.

Glob          Regex
?(pattern)    (pattern)?
*(pattern)    (pattern)*     
+(pattern)    (pattern)+    

I therefore have difficulty understanding how file matching with wildcards (e.g. *.sh). Are wildcards something different than glob patterns? From what I see, the search pattern *.sh does not include a metacharacter to match any character after *.

Kusalananda
  • 333,661
Vera
  • 1,223

1 Answers1

3

A wildcard is part of a glob pattern. At their simplest * and ? are wildcards that are also glob patterns. Here are some simple globs:

*.sh              # could match "fred.sh" and also ".sh"
matchme           # would match "matchme"
file[0-9]*.txt    # could match "file12.txt" but also "file12yz.txt"
?ile              # could match "mile" or "file" but not "smile"

The glob patterns you have listed in your question are extended glob patterns. For bash they are active in [[ "$var" == {pattern} ]] constructs but are only available for filename matching if extglob is enabled:

shopt -s extglob        # Enable extended globs (bash)
ls -d +([[:digit:]])    # List files/directories whose names are just digits

From what I see, the search pattern *.sh does not include a metacharacter to match any character after *

The pattern *.sh uses a wildcard * that will match zero or more characters. Remember that globs are not Regular Expressions (ordinary or Extended), but these are equivalent:

Glob            RE              Description

.sh ^..sh$ Match anything ending with the three characters ".sh" ?ile ^.ile$ Match any single character then the text "ile" +([[:digit:]]) ^[[:digit:]]+$ Match one or more digits (potentially any alphabet, not just 0-9)

Extended Glob ERE Description @(one|two) ^(one|two)$ Match either "one" or "two"

Note that an ordinary RE to match "one" or "two" would need to mark out the brackets and separator, i.e. ^\(one\|two\)$, as they are not included. In contrast an Extended Regular Expression does include these operators, i.e. ^(one|two)$

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • The *.sh example is still unclear to me. Does it mean that the glob construct *(pattern) that does not include a pattern in parenthesis is assumed to represent any character as pattern? – Vera Jan 29 '23 at 23:41
  • 1
    @Veak In a glob pattern, * normally means "any sequence of characters", and ? normally means "any single character". But if extended globs are enabled and one of those is followed by something in parentheses, its meaning shifts to "any number of these:" (*) or "maybe one of these:" (?). These secondary meanings are not part of the original design of glob patterns; they were added to extend the standard glob syntax to allow more complex (regex-like) patterns, and frankly their syntax is an ugly and inconsistent kluge. – Gordon Davisson Jan 30 '23 at 00:13
  • @GordonDavisson Thank you for your cogent answer. – Vera Jan 30 '23 at 00:50