Contrary to ksh or zsh, bash has no builtin support for sorting arrays or lists of arbitrary strings. It can sort globs or the output of alias
or set
or typeset
(though those last 3 not in the user's locale sorting order), but that can't be used practically here.
There's nothing in the POSIX toolchest that can readily sort arbitrary lists of strings either¹ (sort
sorts lines, so only short (LINE_MAX being often shorter than PATH_MAX) sequences of characters other than NUL and newline, while file paths are non-empty sequences of bytes other than 0).
So while you could implement your own sorting algorithm in awk
(using the <
string comparison operator) or even bash
(using [[ < ]]
), for arbitrary paths in bash
, portably, the easiest may be to resort to perl
:
With bash4.4+
, you could do:
readarray -td '' sorted_filearray < <(perl -MFile::Basename -l0 -e '
print for sort {basename($a) cmp basename($b)} @ARGV' -- "${filearray[@]}")
That gives a strcmp()
-like order. For an order based on the locale's collation rules like in globs or the output of ls
, add a -Mlocale
argument to perl
. For numeric sort (more like GNU sort -g
as it supports numbers like +3
, 1.2e-5
and not thousand separators, though not hexadimals), use <=>
instead of cmp
(and again -Mlocale
for the user's decimal mark to be honoured like for the sort
command).
You'll be limited by the maximum size of arguments to a command. To avoid that, you could pass the list of files to perl
on its stdin instead of via arguments:
readarray -td '' sorted_filearray < <(
printf '%s\0' "${filearray[@]}" | perl -MFile::Basename -0le '
chomp(@files = <STDIN>);
print for sort {basename($a) cmp basename($b)} @files')
With older versions of bash
, you could use a while IFS= read -rd ''
loop instead of readarray -d ''
or get perl
to output the list of paths properly quoted so you can pass it to eval "array=($(perl...))"
.
With zsh
, you can fake a glob expansion for which you can define a sort order:
sorted_filearray=(/(e{'reply=($filearray)'}oe{'REPLY=$REPLY:t'}))
With reply=($filearray)
we actually force the glob expansion (which initially was just /
) to be the elements of the array. Then we define the sort order to be based on the tail of the filename.
For a strcmp()
-like order, fix the locale to C. For numeric sort (similar to GNU sort -V
, not sort -n
which makes a significant difference when comparing 1.4
and 1.23
(in locales where .
is the decimal mark) for instance), add the n
glob qualifier.
Instead of oe{expression}
, you can also use a function to define a sorting order like:
by_tail() REPLY=$REPLY:t
or more advanced ones like:
by_numbers_in_tail() REPLY=${(j:,:)${(s:,:)${REPLY:t}//[^0-9]/,}}
(so a/foo2bar3.pdf
(2,3 numbers) sorts after b/bar1foo3.pdf
(1,3) but before c/baz2zzz10.pdf
(2,10))
and use as:
sorted_filearray=(/(e{'reply=($filearray)'}no+by_numbers_in_tail))
Of course, those can be applied on real globs as that's what they're primarily intended for. For instance, for a list of pdf
files in any directory, sorted by basename/tail:
pdfs=(**/*.pdf(N.oe+by_tail))
¹ If a strcmp()
-based sorting is acceptable, and for short strings, you could transform the strings to their hex-encoding with awk
before passing to sort
and transform back after sorting.
dir1
dir2
are just made up, and they are actually arbitrary pathnames. – Tim Sep 23 '17 at 13:21