5

I was reading about "Glob" and "Globbing Pathnames", and I found this strange (to me) part in man pages:

"[--0]" matches the three characters '-', '.', '0', since '/' cannot be matched.

I am confused! How do two dashes and a 0 match .? What is the role of the / character here? Is this is a bug in man page?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

2 Answers2

6

As explained in the beginning of that paragraph in that man page, '-' character, when put between two characters, represents a range of characters, and also, '-' character, when put as first or last character between brackets, has its literal meaning. So, the first dash really means a '-' character, and the second dash is a range specifier. So the whole pattern consists of all the characters between '-' and '0', which, in the C/POSIX locale (but generally not in others) are:

-
.
/
0

and since '/' cannot be matched, the pattern matches three characters '-', '.', '0'.

1

This has nothing to do with man pages themselves: it's a description of the syntax of glob patterns, which the man page you're looking at is about.

In a glob pattern, brackets delimit a character set. For example [abc] matches any of the characters a, b or c. The pattern fo[abc] matches foa, fob and foc (but not e.g. foo, or fo, or foab).

Inside the brackets, the character - has a special meaning: it is used to form a character range. So rather than matching 0, - or 9, the pattern [0-9] matches any digit. Ranges can be combined with other ranges and lone characters; for example [A-Za-z_] in the ASCII encoding matches any letter or an underscore.

The minus sign is only interpreted as a range indicator when this would be syntactically sensible: if it's the first or last character inside the brackets, or if it comes immediately after another range, - stands for itself. So in [--0], the first - stands for itself and the second - is a range indicator, therefore this pattern matches any character that is between - and 0 in the current locale.

In the ASCII encoding, this range covers the following 4 characters: -, ., /, 0. The character / cannot appear inside a file name, because it is always interpreted as a directory separator; therefore the pattern [--0] matches only the 3 characters -, . and 0.

Note that in locales other than ASCII, the pattern could match a different set of characters. The effect of locale setting on character ranges is somewhat variable between systems and applications.

Most regular expressions engines use the same syntax for character ranges as shell glob patterns, with two differences:

  • In a glob pattern, if the first character after the opening bracket is a !, the pattern matches all characters that are not in the set. In a regular expression, the character ^ plays the same role. Some shells support ^ as well as !.
  • Some regular expression variants allow \ to make the next character lose its special meaning, e.g. [\[\]\-a] matches [, ], - or a. In other regular expression variants and in glob patterns, a backslash in character sets has no special meaning. If ] is in the set, it must come first (it's impossible to specify an empty set: [] is an incomplete pattern, matching a closing bracket or any character that comes afterwards).