15

Normally, bash globbing is case sensitive:

$ echo c*
casefix.pike cdless chalices.py charconv.py chocolate.pike circum.py clip.pike cpustats.pike crop.pike cwk2txt.py
$ echo C*
CarePackage.md ChocRippleCake.md Clips

Using square brackets doesn't seem to change this:

$ echo [c]*
casefix.pike cdless chalices.py charconv.py chocolate.pike circum.py clip.pike cpustats.pike crop.pike cwk2txt.py
$ echo [C]*
CarePackage.md ChocRippleCake.md Clips

It still doesn't change it if a hyphen is used:

$ echo [c-c]*
casefix.pike cdless chalices.py charconv.py chocolate.pike circum.py clip.pike cpustats.pike crop.pike cwk2txt.py
$ echo [C-C]*
CarePackage.md ChocRippleCake.md Clips

But the letters are interspersed:

$ echo [B-C]*
CarePackage.md casefix.pike cdless chalices.py charconv.py chocolate.pike ChocRippleCake.md circum.py clip.pike Clips cpustats.pike crop.pike cwk2txt.py
$ echo [b-c]*
beehive-anthem.txt bluray2mkv.pike branch branchcleanup.pike burdayim.pike casefix.pike cdless chalices.py charconv.py chocolate.pike circum.py clip.pike cpustats.pike crop.pike cwk2txt.py

This suggests that the hyphen is using a locale order, "AaBbCcDd". So: is there any way to glob for all files that begin with an uppercase letter?

rosuav
  • 830
  • 2
  • 10
  • 16

5 Answers5

14

In bash version 4.3 and later, there is a shopt option called globasciiranges :

According to shopt builtin gnu man pages:

globasciiranges
If set, range expressions used in pattern matching bracket expressions (see Pattern Matching) behave as if in the traditional C locale when performing comparisons. That is, the current locale’s collating sequence is not taken into account, so ‘b’ will not collate between ‘A’ and ‘B’, and upper-case and lower-case ASCII characters will collate together.

As a result you can

$ shopt -s globasciiranges 
$ echo [A-Z]*

Use shopt -u for disabling.

Another way is to change locale to C. You can do this temporarily using a subshell:

$ ( LC_ALL=C ; printf '%s\n' [A-Z]*; )

You will get the results you need, and when the sub shell is finished, the locale of your main shell remains unchanged to whatever was before.

Another alternative is instead of [A-Z] to use brace expansion {A..Z} together with nullglob bash shopt option.

By enabling the nullglob option, if a pattern is not matched during pathname expansion, a null string is returned instead of the pattern itself.
As a result this one will work as expected:

$ shopt -s nullglob;printf '%s\n' {A..Z}*
rivy
  • 103
  • 2
    Perfect, thanks. I can't use [[:upper:]] because I actually want just part of the alphabet, but this works. – rosuav Jun 17 '17 at 22:00
  • 1
    @rosuav Welcome. Check also the sub shell alternative. – George Vasiliou Jun 17 '17 at 22:11
  • “if enabled equals to C locale” -- do you mean it affects the locale used for globbing and nothing else? (A reference link would have been helpful -- the best I can find is https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html, but I would have preferred a list of all shell options, but globasciiranges is missing from https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html#The-Set-Builtin; also the question https://unix.stackexchange.com/questions/227070/why-does-a-z-match-lowercase-letters-in-bash handles this issue extensively.) Also from version 4.3. – PJTraill Jun 18 '17 at 09:26
  • @PjTrail See my edit with a reference link to all shopt options . Also you can run man bash in your terminal and search (using /) for globasciiranges. – George Vasiliou Jun 18 '17 at 10:05
  • Wouldn't LC_ALL=C printf '%s\n' [A-Z]* work for your second solution - without a subshell? BTW: there's a typo: nullblog, but it's too few characters for me to correct it. – Joe Jun 25 '17 at 08:00
  • @joe No , this syntax did not worked in my tests. You can try it your self. – George Vasiliou Jun 25 '17 at 10:02
  • The bash shell is still undergoing changes, and the character ranges have been returned back to the LC_COLLATE=C behavior (at some point prior to Nov 2020). The man page documentation (on Ubuntu 20.04) has not been updated, and still refers to locale for ranges. The LC_COLLATE="en_US.UTF-8"'s "broken" lower case ranges which returned upper case results except for the end of the range ([a-c] matches aAbBc), now just return the abc results, and the broken workarounds like [a-C] are now illegal syntax. – ubfan1 Apr 10 '22 at 17:14
6

You can write the all the uppercase letters just fine like:

[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*

or use can use the named character class [:upper:] to represent all uppercase letters in your current locale:

[[:upper:]]*

As you have noticed, while using range like [B-C] the upper and lower case for same alphabetic character are being arranged adjacently (according to collation order of the locale).

heemayl
  • 56,300
3

Including “unintuitive” characters in character ranges, such as including lowercase letters in a range whose boundaries are uppercase letters, is due to the LC_COLLATE locale setting. LC_COLLATE is supposed to indicate sorting order, but it does a poor job of it (sorting strings is more complex than what locales can do) and you're better off without it. I recommend to remove LC_COLLATE from your locale settings. If you're setting LANG, or LANGUAGE, don't do that and set only the ones you need: LC_CTYPE, LC_MESSAGES, LC_TIME.

For more background about locales, see What should I set my locale to and what are the implications of doing so? and set LC_* but not LC_ALL

To get reliable results in a script regardless of the user's settings, set LC_ALL=C.

0

Set:

shopt -u nocaseglob

From bash man page:

>     nocaseglob
>         If  set,  bash matches filenames in a case-insensitive
>         fashion when performing pathname expansion (see Pathname
>          Expansion above).

If you set 'globasciiranges' I do not know what will happen to non-ascii characters like utf-8

Udi
  • 75
0

echo [cC]* should do what you want, similarly [A-Za-z]*

I'm here because globbing on my system has just stopped being case sensitive, so loads of my scripts no longer work as they should :-(