2

I am using sort in bash, but getting different orders for two files, even though the first set of characters are the same.

file1:

  "(0, -11)": "(-1.24636393592-0.992799153308j)", 
  "(0, 1)": "(149.807097864-5.44350795193j)", 
  "(0, 0)": "(17.1604053672+3.88079235934j)", 
  "(0, 11)": "(-1.59903812426-0.923768768117j)", 
  "(0, -1)": "(47.1824114723-21.6682255934j)", 
  "(0, 10)": "(-7.9306816865-1.40521728962j)", 
  "(0, 12)": "(-1.01650580426-1.04187674309j)", 
  "(0, -10)": "(-0.901802059305-0.821904477534j)", 

file2:

  "(0, 0)": "(0.581223595766+0.883221459338j)", 
  "(0, -1)": "(0.0296256019162+0.632637319226j)", 
  "(0, -10)": "(0.792520325166+0.141433946136j)", 
  "(0, 10)": "(-1.20153329399-0.805695804956j)", 
  "(0, 1)": "(0.285821897179-0.508323457505j)", 
  "(0, 11)": "(0.0402120404586-1.57660120897j)", 
  "(0, -11)": "(0.476001913928+0.127280670816j)", 
  "(0, 12)": "(-0.257439911355-1.2545061217j)",

sort file1 gives :

  "(0, 0)": "(17.1604053672+3.88079235934j)", 
  "(0, -10)": "(-0.901802059305-0.821904477534j)", 
  "(0, 10)": "(-7.9306816865-1.40521728962j)", 
  "(0, -11)": "(-1.24636393592-0.992799153308j)", 
  "(0, 11)": "(-1.59903812426-0.923768768117j)", 
  "(0, 1)": "(149.807097864-5.44350795193j)", 
  "(0, 12)": "(-1.01650580426-1.04187674309j)", 
  "(0, -1)": "(47.1824114723-21.6682255934j)", 

sort file2 gives :

  "(0, 0)": "(0.581223595766+0.883221459338j)", 
  "(0, -1)": "(0.0296256019162+0.632637319226j)", 
  "(0, -10)": "(0.792520325166+0.141433946136j)", 
  "(0, 10)": "(-1.20153329399-0.805695804956j)", 
  "(0, 1)": "(0.285821897179-0.508323457505j)", 
  "(0, 11)": "(0.0402120404586-1.57660120897j)", 
  "(0, -11)": "(0.476001913928+0.127280670816j)", 
  "(0, 12)": "(-0.257439911355-1.2545061217j)", 

Similarly sort file1 file2 gives a list that doesn't appear to be sorted, alphabetically, numerically or otherwise.

I'd expect the default to do an alphabetical sort - which compares one character at a time. The lists should be fully sortable without ever reaching the 10th or so character where they start to differ, so why do I get different orders when I sort them?

EDIT 1 Using numeric flags -g or -n still gives inconsistent results.

Sorting by the first field works as expected, e.g. sort <(cat file1 file2 | cut -f1 -d':')

EDIT 2 For an answer to my question see accepted answer.

The solution to my problem (inspired by the answer below) seems to be:

LC_ALL=C sort file1
LC_ALL=C sort file2

This does a byte-wise sort. I don't care about the sort order, so long as two files with the same contents end up in the same order, and I think this accomplishes that.

EDIT 3

This is not a duplicate of the other question. I am asking nothing about sorting << brackets. Yes, the answer does apply. There is a difference between duplicate questions and separate questions to which the same broad answer can apply. The key here is that I (and possibly others with my question) would not have found the other question while looking for the problem I'm having.

TLDR; They are not duplicate questions - just related questions and related answers. They should be 'linked'. Not marked duplicate.

1 Answers1

3

Sorting follows collation rules, which are selected by LC_COLLATE locale setting (or LC_ALL if set, falling back to LANG when both are not set). The rationale behind this is obvious - different languages have different rules for alphabetical ordering.

Apparently collation of the language of your choice skips the "-"s.

$ LC_COLLATE=en_DK sort file2
"(0, 0)": "(0.581223595766+0.883221459338j)", 
"(0, -1)": "(0.0296256019162+0.632637319226j)", 
"(0, -10)": "(0.792520325166+0.141433946136j)", 
"(0, 10)": "(-1.20153329399-0.805695804956j)", 
"(0, 1)": "(0.285821897179-0.508323457505j)", 
"(0, 11)": "(0.0402120404586-1.57660120897j)", 
"(0, -11)": "(0.476001913928+0.127280670816j)", 
"(0, 12)": "(-0.257439911355-1.2545061217j)", 

$ LC_COLLATE=C sort file2
"(0, -1)": "(0.0296256019162+0.632637319226j)", 
"(0, -10)": "(0.792520325166+0.141433946136j)", 
"(0, -11)": "(0.476001913928+0.127280670816j)", 
"(0, 0)": "(0.581223595766+0.883221459338j)", 
"(0, 1)": "(0.285821897179-0.508323457505j)", 
"(0, 10)": "(-1.20153329399-0.805695804956j)", 
"(0, 11)": "(0.0402120404586-1.57660120897j)", 
"(0, 12)": "(-0.257439911355-1.2545061217j)",