GNU coreutils `sort` behave differently

Question

I wanted to sort a list of data and I intended to sort it based on its first column which is an IP address.

192.168.1.100
192.168.1.101
192.168.1.110
192.168.1.119
192.168.1.20
192.168.1.30
192.168.1.33
192.168.1.54
192.168.1.64
192.168.1.6
192.168.1.91

On my first machine, I tested sort -n and It worked as I expected

# coreutils, version: 8.31, release: 23

192.168.1.6
192.168.1.20
192.168.1.30
192.168.1.33
192.168.1.54
192.168.1.64
192.168.1.91
192.168.1.100
192.168.1.101
192.168.1.110
192.168.1.119

But on my second machine, it won't sort properly

# coreutils, version:8.4

192.168.1.100
192.168.1.101
192.168.1.110
192.168.1.119
192.168.1.20
192.168.1.30
192.168.1.33
192.168.1.54
192.168.1.6
192.168.1.64
192.168.1.91

Both machines have the same locale en_US.UTF-8

Why is this happening? How can I resolve it?

Hypothesis: the first machine uses a locale where locale thousands_sep returns .. Probably it's not en_US.UTF-8 (at least not as LC_NUMERIC). The second machine doesn't use . as thousands separator. — Kamil Maciorowski, Feb 10 '20 at 08:54
@KamilMaciorowski Turns out, the second machine uses , instead of . . Thanks for your info — annahri, Feb 10 '20 at 09:18
I think , is the right thousands separator in en_US.UTF-8, so I would say it's the other way around: the first one uses . instead of ,. — Kamil Maciorowski, Feb 10 '20 at 09:24
@KamilMaciorowski I just checked that my first machine uses different LC_NUMERIC. — annahri, Feb 10 '20 at 09:44
The second result is what you should expect in the C locale. So check your locale setup. — schily, Feb 11 '20 at 13:39
FYI because you're using GNU sort, you can use -V aka --version-sort instead of -n. It performs a natural sort of things that look like version numbers (and ipv4 addresses look enough like version numbers for it to work). — cas, Apr 05 '21 at 09:46

score 1 · Accepted Answer · edited Feb 10 '20 at 14:17

Without a proper key position, sort uses the entire line as the key. Since in all the lines, the first three octets remain the same, the entirety of the sorting is based on the numerical positions of the first character in the last octet. Since 1 appears before 2 the octets with 100, 101 appear before the other.

Define the proper key position and use the numerical sort. For e.g. in your case set the delimiter for the input as . and let sort to work its magic on 4th field only. The 4,4 means start at the 4th field delimited by . and stop at the same 4th field.

sort -n -t'.' -k4,4 file

Also you can override any other locale settings defined in your system and directly use the system's default with LC_ALL=C locally to the command. See What does LC_ALL=C do? to understand why

LC_ALL=C sort -n -t'.' -k4,4 file

Thanks to Kamil Maciorowski's comment which highlighted the actual issue.

The first machine seems to be using a locale where locale thousands_sep returns . Probably it's not en_US.UTF-8 (at least not as LC_NUMERIC). The second machine doesn't use . as thousands separator.

Your answer does resolve my problem but I still have no idea why a simple sort -n worked fine on my first machine but it fails on the second one. — annahri, Feb 10 '20 at 08:46
Please see the comments under the question. I don't want to add a partial answer and leave your answer partial as well. You have my permission to absorb my observation into your answer. — Kamil Maciorowski, Feb 10 '20 at 09:26

GNU coreutils `sort` behave differently

1 Answers1