2

Given these six files:

$ touch  'sec*et'  'sec\*et'  'sec\et'   secet   secret  'sec\ xxx et'

Why backslash in unquoted variable for glob expansion matches only the sec\*et file?

$ v="sec\*et" ; ls $v
'sec\*et'
$ v='sec\*et' ; ls $v
'sec\*et'

Related to this SO answer, the POSIX definition:

The < backslash > shall retain its special meaning as an escape character ... only when followed by one of the following characters when considered special:

$ ` " \ <newline>

and the Bash manual:

The backslash retains its special meaning only when followed by one of the following characters: ‘$’, ‘`’, ‘"’, ‘\’, or newline. Within double quotes, backslashes that are followed by one of these characters are removed. Backslashes preceding characters without a special meaning are left unmodified.

I understand that the backslash (before an asterisk) in the variable is a literal backslash:

$ v='sec\*et' ; printf '%s' "$v" | hexdump -C
00000000  73 65 63 5c 2a 65 74                              |sec\*et|
00000007

But I do not understand why the wildcard character * loses it special meaning after a literal backslash.


Three things I understand:

(A) An asterisk * in an unquoted variable has its special meaning in glob expansion. They are equivalent:

$ v='sec*et' ; ls $v
'sec*et'  'sec\*et'  'sec\et'   secet   secret  'sec\ xxx et'
$ ls sec*et
'sec*et'  'sec\*et'  'sec\et'   secet   secret  'sec\ xxx et'

(B) An asterisk * loses its special meaning after a backslash:

$ ls sec\*et
'sec*et'

(C) A literal backslash cannot make the asterisk * loses its special meaning:

$ v='sec\\*et' ; ls $v
'sec\*et'  'sec\et'  'sec\ xxx et'
$ ls sec\\*et
'sec\*et'  'sec\et'  'sec\ xxx et'

This is what I do not understand:

However it is weird that, the asterisk loses its special meaning, but the backslash is not discarded, in this case:

$ v='sec\*et' ; ls $v
'sec\*et'

Somehow it is equivalent to a literal backslash followed by a literal asterisk:

$ ls sec\\\*et
'sec\*et'

But why? Consider:

  1. If special characters in quotes become literal, (A) does not hold.

  2. If the asterisk becomes literal because of following a backslash, why the backslash is not discarded, and matches the file sec*et, like in (B)?


In application, other than using Bracket Expression [*], how to define a string variable that matches a literal asterisk when used in glob expansion?

$ v='sec < what what what > et' ; ls $v
'sec*et'
midnite
  • 423
  • I think this is because unquoted text undergoes Word Expansions while variable undergoes Parameter Expansion. If a variable is unquoted, it then undergoes Pathname Expansion. However I am still not sure if it will perform Quote Removal. It does in case (C). But it does not in the questioned case. – midnite Jan 17 '24 at 13:17
  • That varies widely with the shell and the version thereof. The POSIX spec has changed (and/or will change) in that regard. You'll find extensive and heated discussions about it in the archives of the POSIX (austin-group-l) mailing list. – Stéphane Chazelas Jan 17 '24 at 13:22
  • Use [*] to match a literal * portably when in a glob that is a result of an unquoted expansion. – Stéphane Chazelas Jan 17 '24 at 13:23
  • See also https://www.austingroupbugs.net/view.php?id=1234#c4564 – Stéphane Chazelas Jan 17 '24 at 13:28
  • the first two quotes you have, the ones above the horizontal line, discuss how quotes work. That's not relevant for the behavior of the glob, since in both cases (with single and with double quotes), the variable ends up containing sec\*et. And when the glob is expanded later, it doesn't matter how the variable got the value it had. (You could have filled it with e.g. read instead of a regular assignment.) – ilkkachu Jan 17 '24 at 18:29

1 Answers1

1
$ v="sec\*et"
$ v='sec\*et'

Either of these sets the variable to sec\*et, and the way the variable got its value doesn't affect how globbing behaves.

The way Bash appears to treat the consequent unquoted expansion appears to be that since it has no unescaped glob characters, the string sec\*et is not treated as a glob at all. It doesn't trigger e.g. failglob, and the backslash is not removed. Instead, it's just printed as-is. (It's not a pattern matching itself: you get the string back even if there is no file of that name.)

This is unlike sec[*]et, which is a glob.

Also, if you were to have sec\*et* instead, it would be a glob and would match filenames that start with sec*et (that is, with the backslash removed).

As far as I can tell, that's how it works for most versions of Bash. The exception being Bash 5.0 where that sec\*et is taken as a glob and where it triggers e.g. failglob.

$ ./bash-4.4/bash    -c 'shopt -s failglob; v="sec\*et"; echo $v'
sec\*et

$ ./bash-5.0/bash -c 'shopt -s failglob; v="sec*et"; echo $v' ./bash-5.0/bash: no match: sec*et

$ ./bash-5.1.16/bash -c 'shopt -s failglob; v="sec*et"; echo $v' sec*et

$ ./bash-5.2.15/bash -c 'shopt -s failglob; v="sec*et"; echo $v' sec*et

(I didn't test all shell versions for all the cases.)

ilkkachu
  • 138,973
  • Thank you for answer. Is it kind of an inconsistent behaviour of Bash that if a pattern (in the unquoted variable) is not treated as a glob, the backslash will not be discarded? It is because if the pattern is not in a variable, ls sec\*et matches sec*et literally. Any docs in POSIX or Bash mentioning this? As my first comment in the question, I still wonder if it is because variable versus plain pattern they undergo different expansions. This behaviour is too specific that I need more explanations. I afraid I might face other corner cases in the future. – midnite Jan 18 '24 at 04:55
  • @midnite ls sec\*et doesn't "match" anything, because sec\*et is not a glob, so there's nothing to match. Again, try with failglob set and without a matching file, it will not complain. Here, the resulting argument to ls is sec*et because the rules for unquoted words in the command line say that the backslash quotes the following character, it's exactly the same as sec"*"et. I'm not sure why it's inconsistent that the backslash isn't discarded? It's not like the shell discards any other characters from values that result from expansions, either, right? – ilkkachu Jan 18 '24 at 07:17
  • The reference manual does say that "After word splitting, unless the -f option has been set (see The Set Builtin), Bash scans each word for the characters *, ?, and [. If one of these characters appears, and is not quoted, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern (see Pattern Matching)." (I expect escaping with a backslash counts as being quoted here, too, but I suppose you'll need to ask the maintainer if that's a question.) – ilkkachu Jan 18 '24 at 07:18
  • Anyway, as Stéphane mentioned, the behavior has varied between shells and shell versions (and IMO, it's not like the shell languages are exactly straightforward in other sense anyway), so it might be easier to just avoid cases like that. If [*] works for what you want, being unquestionably a glob that only matches a literal asterisk, why not use that? – ilkkachu Jan 18 '24 at 07:22