When sorting file names, ls
ignores characters like -,_
. I expected it to use those characters in sorting as well.
An example:
touch a1 a2 a-1 a-2 a_1 a_2 a.1 a.2 a,1 a,2
Now display these files with ls -1
:
a1
a_1
a-1
a,1
a.1
a2
a_2
a-2
a,2
a.2
What I expected was something like this:
a1
a2
a,1
a,2
a.1
a.2
a_1
a_2
a-1
a-2
i.e. I expected the non-alphanumeric characters to be taken into account when sorting.
Can anyone explain this behaviour? Is this behaviour mandated by a standard? Or is this due the encoding being UTF-8?
Update: It seems that this is related to UTF-8 sorting:
$ LC_COLLATE=C ls -1
a,1
a,2
a-1
a-2
a.1
a.2
a1
a2
a_1
a_2
LC_COLLATE=C ls
? – Alexios Apr 01 '12 at 09:56[_-,.]
are being grouped and somehow semi-ignored. I don't know exactly how or where such collation is defined, but it must be a collation issue, because simply, and only, changing the collation to C (viaLC_COLLATE=C ls -l
) is enough to give you the sort order you expected (assuming theLC_ALL
is not overridingLC_COLLATE
). This holds true for the entire range of characters in the Unicode Basic Multilingual Plane... I've edited my answer to include an example script which bears this out... – Peter.O Apr 01 '12 at 14:29