0

I often confuse Bash 3.x shell globs:

?      # Match any single character.
*      # Match any string of characters (up until the asterisk).
[set]  # Match any character in set (but not the entire set itself).
[!set] # Match any character not in set.

with regex (especially PCRE).

My question is why not seeing these as "Bash regex" (just as we have "JavaScript regex" for example)?

Why not seeing these as just another "dialect" of regex?

Of course it would be unorthodox, but I'm not sure there isn't any formal-logical reason not to.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

2 Answers2

3

Filename globbing patterns and regular expressions have a syntax overlap, to some degree, but they work in fundamentally different ways.

The regular expression e will match (in) the string hello while the filename globbing pattern e would not. Globbing patterns are implicitly anchored, so the glob pattern e would be equivalent to a regular expression looking something like ^e$, but their application may be different (the regular expression would match against a complete line in a text, whereas the glob pattern would typically match against a single filename).

Filename globbing patterns also don't have any special characters that qualifies the previous expression, such as * or ? in a regular expression, or any facility to group parts of expressions as (...) does in a regular expression, or to alternate between possible sub-patterns as | does in a regular expression. Some shells obviously add some of this, as bash does with shopt -s extglob enabled, for example.

Globbing patterns have different use from regular expressions. Regular expressions are primarily used for selecting/matching strings from texts, whereas filename globbing patterns are primarily (but not exclusively) used for matching filenames or generating lists of existing names from a directory. Globbing patterns are used to match strings in e.g. case ... esac, but a POSIX shell never uses regular expressions for generating lists of names from a directory, unless extended with that capability.

Both types of patterns are defined by the POSIX standard:

Of globbing patterns, the standard starts out by saying

The pattern matching notation described in this section is used to specify patterns for matching strings in the shell. Historically, pattern matching notation is related to, but slightly different from, the regular expression notation described in XBD Regular Expressions. For this reason, the description of the rules for this pattern matching notation are based on the description of regular expression notation, modified to account for the differences.

There are several "dialects" of regular expressions, such as PCRE which you mention, but filename globbing patterns can not really be said to be one of them.

There are several pattern languages that are similar to the shell's filename globbing patterns, such as the patterns used in SQL queries with LIKE. These are all quite simple and generally provided as a convenient way of matching bits of strings. Regular expressions are, in comparison, a whole lot more complex.


You mention "bash regular expressions". The bash shell does support regular expressions, but not for filename matching. Within [[ ... ]], the =~ operator performs a regular expression match of the string on the left hand side against the regular expression on the right. The type of regular expressions that the bash shell supports in this way are the standard extended set of regular expressions. See the bash manual on your system for further information about this.

Kusalananda
  • 333,661
  • Kusalananda, what is implicitly anchored? Encapsulates a behavior? I guess. – Arcticooling Apr 25 '18 at 08:45
  • 1
    @user9303970 Regular expressions may be anchored to the start or end of a line using ^ and $ respectively ("explicitly"). A filename globbing pattern is always anchored to the start and end of a string. The pattern a*b will not match the filename abba, for example, because the b in a*b has to occur at the end (it is "implicitly" anchored to the end). In contrast, the regular expression a.*b will match abba, because the b isn't anchored to the end. – Kusalananda Apr 25 '18 at 08:48
2

Globs and regular expressions are two distinct pattern languages. The fact that there is some overlap in the semantics of certain patterns doesn't mean that one is necessarily a "dialect" of the other. Everybody who has used both recognize the similarity, but keeping the names distinct reduces the possibility for confusion.

On a related note I find it unfortunate that so many types of regular expressions (at least basic, extended, and Perl with variations) have very similar names. Most people in my experience do not qualify which one they mean, sometimes causing unnecessary confusion. If these flavours had been given much more distinct names it might have been easier to talk about them without misunderstandings.

l0b0
  • 51,350