I have trouble understanding unix sort. Consider the following file (tab separated)
aa ~ a1
aa B
b A
b ~ e
bb B
bb ~ B
When calling:
cat tmp2 | sort -t $'\t' -k1,2
I get
aa ~ a1
aa B
b A
bb B
bb ~ B
b ~ e
As far as I understand, -t $'\t' says to consider the separator to be a tab instead of a white space and -k1,2 says to sort by the first column and, if two rows have the same fist column, then by the second one. But in that case, shouldn't my last 'b' appear in the fourth row?
'a'
'a2'
'aa'
'aa3' 'aa~' will be sorted as 'cat tmp2 | sort -k1,1' 'a' 'a~' 'a2' 'aa' 'aa~' 'aa3' ? Since the '~' is after any other character in the ascii table, 'aa~' should be sorted after 'aa3' for example?
– giulio Feb 07 '15 at 21:00sort -k1,2
andsort -k1,1 -k2,2
? At least from your description, I don't understand the difference between the two commands.diff <(sort -t $'\t' -k1,2 <<<"$content") <(sort -t $'\t' -k1,2 -k2,2 <<<"$content")
anddiff <(LC_ALL=C sort -t $'\t' -k1,2 <<<"$content") <(LC_ALL=C sort -t $'\t' -k1,2 -k2,2 <<<"$content")
produced no output. – Six Sep 16 '15 at 02:39strcoll()
(the comparison function used bysort
) works the same asstrcmp()
there (and there only). So forsort -k1,1 -k2,2
andsort -k1,2
to produce different results, you need an input that contains characters that sort before the separator. In the case of TAB, that's going to be byte values 0 to 7 which are unlikely to be found in text, but you can compare on the output ofprintf 'a\1\tb\na\tc\n'
. – Stéphane Chazelas Sep 16 '15 at 08:53