61

I have a directory filled with files with names like logXX where XX is a two-character, zero-padded, uppercase hex number such as:

log00
log01
log02
...
log0A
log0B
log0C
...
log4E
log4F
log50
...

Generally there will be fewer than say 20 or 30 files total. The date and time on my particular system is not something that can be relied up on (an embedded system with no reliable NTP or GPS time sources). However the filenames will reliably increment as shown above.

I wish to grep through all the files for the single most recent log entry of a certain type, I was hoping to cat the files together such as...

cat /tmp/logs/log* | grep 'WARNING 07 -' | tail -n1

However it occurred to me that different versions of bash or sh or zsh etc. might have different ideas about how the * is expanded.

The man bash page doesn't say whether or not the expansion of * would be a definitely ascending alphabetical list of matching filenames. It does seem to be ascending every time I've tried it on all the systems I have available to me -- but is it DEFINED behaviour or just implementation specific?

In other words can I absolutely rely on cat /tmp/logs/log* to concatenate all my log files together in alphabetical order?

  • You could add | sort which would make an ascending list. – ADDB May 31 '17 at 13:43
  • 1
    @ADDB The default sort order for sort is the same as that for the shell when it's expanding a filename globbing pattern. – Kusalananda May 31 '17 at 14:11
  • 9
    That's terrible file naming practice. Why do you start your run with log(0)=-infty? – E.P. May 31 '17 at 19:58
  • 16
    @E.P. Our filesystem is a complex 7 dimensional hyper-toroid with surreal numbering of the inodes. It was grandfathered in with some obscure branch of busybox and we're stuck wit it now :) –  Jun 01 '17 at 07:21
  • 1
    You can avoid cat with grep -h pattern /tmp/logs/log* to suppress prepending filenames to the matches. (At least with GNU grep, i didn't check POSIX or busybox.) – Peter Cordes Jun 02 '17 at 01:59
  • 2
    @Kusalananda You've heard of useless use of cat, this is useless use of sort – cat Jun 02 '17 at 02:06
  • @PdC, thank you for tidying up my formatting. Is there a "style guide" of some sort for the U&L stackexchange? –  Jun 04 '17 at 10:18
  • @Wossname, happy to help. Well, there are some key rules, I simply apply the formatting available as shown when writing a question. Also, here is an interesting read for more information about styling: https://meta.stackexchange.com/questions/18614/style-guide-for-questions-and-answers – Paul-Beyond Jun 04 '17 at 17:32

4 Answers4

61

In all shells, globs are sorted by default. They were already by the /etc/glob helper called by Ken Thompson's shell to expand globs in the first version of Unix in the early 70s (and which gave globs their name).

For sh, POSIX does require them to be sorted by way of strcoll(), that is using the sorting order in the user's locale, like for ls though some still do it via strcmp(), that is based on byte values only.

$ dash -c 'echo *'
Log01B log-0D log00 log01 log02 log0A log0B log0C log4E log4F log50 log① log② lóg01
$ bash -c 'echo *'
log① log② log00 log01 lóg01 Log01B log02 log0A log0B log0C log-0D log4E log4F log50
$ zsh -c 'echo *'
log① log② log00 log01 lóg01 Log01B log02 log0A log0B log0C log-0D log4E log4F log50
$ ls
log②  log①  log00  log01  lóg01  Log01B  log02  log0A  log0B  log0C  log-0D  log4E  log4F  log50
$ ls | sort
log②
log①
log00
log01
lóg01
Log01B
log02
log0A
log0B
log0C
log-0D
log4E
log4F
log50

You may notice above that for those shells that do sorting based on locale, here on a GNU system with a en_GB.UTF-8 locale, the - in the file names is ignored for sorting (most punctuation characters would). The ó is sorted in a more expected way (at least to British people), and case is ignored (except when it comes to decide ties).

However, you'll notice some inconsistencies for log① log②. That's because the sorting order of ① and ② is not defined in GNU locales (currently; hopefully it will be fixed some day). They sort the same, so you get random results.

Changing the locale will affect the sorting order. You can set the locale to C to get a strcmp()-like sort:

$ bash -c 'echo *'
log① log② log00 log01 lóg01 Log01B log02 log0.2 log0A log0B log0C log-0D log4E log4F log50
$ bash -c 'LC_ALL=C; echo *'
Log01B log-0D log0.2 log00 log01 log02 log0A log0B log0C log4E log4F log50 log① log② lóg01

Note that some locales can cause some confusions even for all-ASCII all-alnum strings. Like Czech ones (on GNU systems at least) where ch is a collating element that sorts after h:

$ LC_ALL=cs_CZ.UTF-8 bash -c 'echo *'
log0Ah log0Bh log0Dh log0Ch

Or, as pointed out by @ninjalj, even weirder ones in Hungarian locales:

$ LC_ALL=hu_HU.UTF-8 bash -c 'echo *'
logX LOGx LOGX logZ LOGz LOGZ logY LOGY LOGy

In zsh, you can choose the sorting with glob qualifiers. For instance:

echo *(om) # to sort by modification time
echo *(oL) # to sort by size
echo *(On) # for a *reverse* sort by name
echo *(o+myfunction) # sort using a user-defined function
echo *(N)  # to NOT sort
echo *(n)  # sort by name, but numerically, and so on.

The numeric sort of echo *(n) can also be enabled globally with the numericglobsort option:

$ zsh -c 'echo *'
log① log② log00 log01 lóg01 Log01B log02 log0.2 log0A log0B log0C log-0D log4E log4F log50
$ zsh -o numericglobsort -c 'echo *'
log① log② log00 lóg01 Log01B log0.2 log0A log0B log0C log01 log02 log-0D log4E log4F log50

If you (as I was) are confused by that order in that particular instance (here using my British locale), see here for details.

  • 1
    The 'ch' case can be even weirder: some locales can decide that 'ch', 'Ch' and 'CH' are 1 collating element each, while 'cH' are two collating elements. See: http://unicode.org/cldr/trac/ticket/889 Current CLDR doesn't seem to be entirely consistent: current Hungarian (http://unicode.org/cldr/trac/browser/trunk/common/collation/hu.xml) has rules like &C<cs<<<Cs<<<CS, while &C<cs<<<cS<<<Cs<<<CS is marked as a proposed experimental draft. Judging from some older data imported into CLDR, older AIX and MS seemed to prefer the "lowercase then uppercase are 2 different collation elements" view. – ninjalj Jun 01 '17 at 18:32
  • And I've seen systems where it didn't work anyway. :( – Joshua Jun 02 '17 at 01:11
37

The man page for bash does specify:

Pathname Expansion

After word splitting, unless the -f option has been set, bash scans each word for the characters *, ?, and [. If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern […].

user4556274
  • 8,995
  • 2
  • 33
  • 37
  • 1
    Just found an interesting bug in either putty or man 's text rendering... if the text I'm searching for gets "word wrapped" then a /search command won't find it. Just maximised my terminal and there it is :) –  May 31 '17 at 14:24
  • 3
    You covered bash. Tho OP was also interested in "zsh etc." – Kusalananda Jun 01 '17 at 04:28
29

Unless you trigger some very specific shell options in some shells, the output is guaranteed to be the same.

The order is specified in the POSIX standard:

If the pattern matches any existing filenames or pathnames, the pattern shall be replaced with those filenames and pathnames, sorted according to the collating sequence in effect in the current locale. If this collating sequence does not have a total ordering of all characters (see XBD LC_COLLATE), any filenames or pathnames that collate equally should be further compared byte-by-byte using the collating sequence for the POSIX locale.

See also LC_COLLATE Category in the POSIX Locale, which in short says that if LC_COLLATE=C, then things are ordered in ASCII order.


The bash manual mentions

LC_COLLATE

This variable determines the collation order used when sorting the results of pathname expansion, and determines the behavior of range expressions, equivalence classes, and collating sequences within pathname expansion and pattern matching.

ksh93 and zsh has a similar wording, which leads me to believe that they follow the POSIX standard in this regard.

Other shells, like pdksh and dash does not say anything about the sorting of the filenames resulting from filename globbing. I'm tempted to believe that this means that they still adhere to the same standard, at least when using the POSIX locale. In my experience, I have not come across a shell that does any overtly "strange" sorting of ASCII filenames.

Kusalananda
  • 333,661
  • 2
    See the numericglobsort option in zsh that would affect the sorting. Though I'd rather enable it on a per-glob basis like echo *(n) than turn the option globally on. – Stéphane Chazelas May 31 '17 at 14:01
  • A nitpick. Bash, in default mode, is NOT Posix-compliant. – fpmurphy Jun 01 '17 at 04:24
  • @fpmurphy1 Say more. – Kusalananda Jun 01 '17 at 04:25
  • @Kusalananda. Bash has never been certified as POSIX-complaint. To get "POSIX-compliance" in Bash, you must invoke Bash with the --posix command line option or execute set -o posix – fpmurphy Jun 01 '17 at 04:33
  • @fpmurphy1 Yes, but the sorting of the expansion of filename globbing characters isn't affected by Bash's posix mode. See https://www.gnu.org/software/bash/manual/html_node/Bash-POSIX-Mode.html This leads me to believe (hope, rather) that the sorting is POSIX compliant. – Kusalananda Jun 01 '17 at 04:38
  • @Kusalananda, correct. Hope is the appropriate word. – fpmurphy Jun 01 '17 at 04:41
  • @fpmurphy1 It's actually doing the sorting POSIXly. See update. – Kusalananda Jun 01 '17 at 04:56
  • @fpmurphy1, on the contrary, bash is the only FLOSS shell to my knowledge that has been certified (when built using conformance options as part of Apple macOS or Inspur K-UX at least). – Stéphane Chazelas Jun 01 '17 at 09:06
  • @StéphaneChazelas. I presume you are referring to the UNIX03 certification of MacOS 10.12 Sierra. I could be wrong but as AFAIR, /bin/sh was different than /bin/bash and was a custom build of a slightly older version of Bash. It was this custom shell than was certified as part of the OS - not the standard version of Bash – fpmurphy Jun 01 '17 at 11:32
  • @fpmurphy1, yes, they made a few minor changes and they never upgraded to 4 (presumably because of GPLv3). You'll always see differences between versions and builds on different OSes anyway. See https://opensource.apple.com/source/bash/. In any case, they didn't make any change in that area though the fact that bash will use macOS libraries will make a difference. The point is that bash is one of the main targets for the OpenGroup even more so now that David Korn no longer maintains ksh (historically the reference implementation) and bash is becoming a de-facto standard. – Stéphane Chazelas Jun 01 '17 at 12:01
2

If the primary goal is to sort input files by their age, oldest first, you could write

(cd /tmp/logs; cat `ls -rt log*`) | grep whatever

And if rotated and compressed logs are also involved:

(cd /tmp/logs; zcat -f `ls -rt log*`) | grep whatever