164

I am trying to sort on multiple columns. The results are not as expected.

Here's my data (people.txt):

Simon Strange 62
Pete Brown 37
Mark Brown 46
Stefan Heinz 52
Tony Bedford 50
John Strange 51
Fred Bloggs 22
James Bedford 21
Emily Bedford 18
Ana Villamor 44
Alice Villamor 50
Francis Chepstow 56

The following works correctly:

bash-3.2$ sort -k2 -k3 <people.txt                                                                                                                    
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

But, the following does not work as expected:

bash-3.2$ sort -k2 -k1 <people.txt                                        
Emily Bedford 18                                                                                                                                      
James Bedford 21                                                                                                                                      
Tony Bedford 50                                                                                                                                       
Fred Bloggs 22                                                                                                                                        
Pete Brown 37                                                                                                                                         
Mark Brown 46                                                                                                                                         
Francis Chepstow 56                                                                                                                                   
Stefan Heinz 52                                                                                                                                       
John Strange 51                                                                                                                                       
Simon Strange 62                                                                                                                                      
Ana Villamor 44                                                                                                                                       
Alice Villamor 50

I was trying to sort by surname and then by first name, but you will see the Villamors are not in the correct order. I was hoping to sort by surname, and then when surnames matched, to sort by first name.

It seems there is something about how this should work I don't understand. I could do this another way of course (using awk), but I want to understand sort.

I am using the standard Bash shell on Mac OS X.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Harry
  • 1,943

4 Answers4

245

A key specification like -k2 means to take all the fields from 2 to the end of the line into account. So Villamor 44 ends up before Villamor 50. Since these two are not equal, the first comparison in sort -k2 -k1 is enough to discriminate these two lines, and the second sort key -k1 is not invoked. If the two Villamors had had the same age, -k1 would have caused them to be sorted by first name.

To sort by a single column, use -k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.

sort -k2 -k3 <people.txt is redundant: it's equivalent to sort -k2 <people.txt. To sort by last names, then first names, then age, run the following command:

sort -k2,2 -k1,1 <people.txt

or equivalently sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt, because sort uses the whole line as a last resort when all the keys in a subset of lines are identical.

Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be "Emily", but the second key " Bedford". Add the -b option to strip those blanks:

sort -b -k2,2 -k1,1

It can also be done on a per-key basis by adding the b flag at the end of the key start specification:

sort -k2b,2 -k1,1 <people.txt

But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like -n, -r...) no longer apply to them so it's better to avoid mixing per-key flags and global flags.

  • 12
    You nailed it. I had assumed (a dangerous thing to do) that specifying -k1 would mean use field 1, where the field ends at the default field separator (space). But as you clearly point out, the k option expects you to specify the start and stop points of the key, which may or may not be a single field. Your solution works perfectly, and more importantly, I am clear on why it does so. Many thanks. – Harry Oct 26 '12 at 07:40
  • 6
    This is HUGE. So many other sources about KEYDEF talk about -k1 -k2 without stressing the importance of the COMMA in the format to limit which columns are considered in each sorting step. I was stuck on this for hours until I found this answer. And the man page is confusing here. It doesn't explain that the "start and stop" locations are specified with the comma notation. Thank you! – Jason Rohrer Oct 03 '19 at 04:07
  • I am sorry, but is the redirection required ? sort can operate on file directly no ? Like sort -n ... file – han solo Nov 11 '19 at 13:48
  • @hansolo Yes, that works as well. – Gilles 'SO- stop being evil' Nov 11 '19 at 23:29
  • @Gilles'SO-stopbeingevil' Yeah, i'm just seeing everyone redirecting to stdin of sort for no reason in the post. That's why :) Also great answer, i wasn't sure how to limit the column for comparison – han solo Nov 12 '19 at 05:03
22

With GNU sort you do it like this, not sure about MacOS:

sort -k2,2 -k1 <people.txt

Update according to comment. Quoted from man sort:

   -k, --key=KEYDEF
          sort via a key; KEYDEF gives location and type

   KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
   F is a field number and C a character position in the field; both are
   origin 1, and the stop position defaults to the line's end.
manatwork
  • 31,277
  • 9
    Could you please explain this strange notation? – scai Oct 24 '12 at 12:19
  • 1
    This got me thinking along the right lines - thanks for that. But don't you need to specify the stop point for the second -k. That is -k2,2 -k1,1 otherwise the stop point is taken as end of line? – Harry Oct 26 '12 at 07:45
  • 1
    @TonyBedford, correct. But not specifying the stop position will not change the result for your current input, but will force consistency in case you will ever have multiple lines with identical field 2 and 1. So I prefer to allow the last -k to include as much as it can. – manatwork Oct 26 '12 at 08:01
  • 2
    @manatwork That should not be necessary; if all the specified fields compare equal, sort will compare the entire line. Or with GNU sort you can use -s for stable sort. – augurar Mar 02 '15 at 19:08
0

You can do this

$ sort -k2,2 -k1,1 people.txt 
Emily Bedford 18
James Bedford 21
Tony Bedford 50
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
John Strange 51
Simon Strange 62
Alice Villamor 50
Ana Villamor 44

So first -k2,2 you are sorting by last name. Then, k1,1 sorting by first name.

Logan Lee
  • 249
0

Using Raku (formerly known as Perl_6)

Adding this answer for U&L users who might be trying to sorting Unicode. Raku has high-level support for Unicode built-in, and this answer (in part) is to help this author understand Raku's sorting rules.


Sorting on one column (last-name) with a 'unary' comparison operator/block (commented out at top), or with a binary block containing the leg "less-than/equal-to/greater-than" string comparison operator. Ties stay in 'encounter' order (i.e. stable sort):

~$ #`{ raku -e '.put for lines.sort: { .words[1] };'  #unary block, OR binary block below: }

~$ raku -e '.put for lines.sort: { $^a.words[1] leg $^b.words[1] };' file Tony Bedford 50 James Bedford 21 Emily Bedford 18 Fred Bloggs 22 Pete Brown 37 Mark Brown 46 Francis Chepstow 56 Stefan Heinz 52 Simon Strange 62 John Strange 51 Ana Villamor 44 Alice Villamor 50


Sorting on two columns, last-name then first-name. At top (commented out), giving sort a list of unary elements to sort on. Second example below: more explicitly using two leg string-comparison operators, with || "short-circuit-OR" in between:

~$ #`{ raku -e '.put for lines.sort: { .words.[1], .words.[0] }; #list of unary elements, OR binary blocks below: }

~$ raku -e '.put for lines.sort: {$^a.words[1] leg $^b.words[1] || $^a.words[0] leg $^b.words[0] };' file Emily Bedford 18 James Bedford 21 Tony Bedford 50 Fred Bloggs 22 Mark Brown 46 Pete Brown 37 Francis Chepstow 56 Stefan Heinz 52 John Strange 51 Simon Strange 62 Alice Villamor 50 Ana Villamor 44

The above Raku code satisfies the title of this question: "Trying to sort on two fields, second then first". But if one column is numeric you can use<=> instead of leg to sort numerically (<=> is commonly termed 'spaceship' operator). Example below:


The following sorts on three (3) columns: last-name, reverse age (oldest first--swap $^b and $^a for reverse sort), then first-name. So in the sorted output Simon Strange 62 will appear before John Strange 51.

Raku has an improved cmp operator which tries to detect Types and make smart comparisons for you (i.e. string comparisons with leg and numeric comparisons with <=>). In the second example below, three cmp comparisons give the exact same sorted output as the first example:

~$ #`{ raku -e '.put for lines.sort: {$^a.words[1] leg $^b.words[1] || $^b.words[2] <=> $^a.words[2] || $^a.words[0] leg $^b.words[0] };' #OR with cmp operator below: }

~$ raku -e '.put for lines.sort: {$^a.words[1] cmp $^b.words[1] || $^b.words[2] cmp $^a.words[2] || $^a.words[0] cmp $^b.words[0] };' file Tony Bedford 50 James Bedford 21 Emily Bedford 18 Fred Bloggs 22 Mark Brown 46 Pete Brown 37 Francis Chepstow 56 Stefan Heinz 52 Simon Strange 62 John Strange 51 Alice Villamor 50 Ana Villamor 44


Finally, the "binary comparator blocks" (above) can give you a precise understanding/control over your sorting mechanism. But if you prefer, sorting on three columns (above) can be simplified to the code below:

~$ raku -e '.put for lines.sort: { ~.words[1], -.words[2], ~.words[0] };'  file
Tony Bedford 50
James Bedford 21
Emily Bedford 18
Fred Bloggs 22
Mark Brown 46
Pete Brown 37
Francis Chepstow 56
Stefan Heinz 52
Simon Strange 62
John Strange 51
Alice Villamor 50
Ana Villamor 44

https://perl6advent.wordpress.com/2013/12/23/day-23-unary-sort/ https://docs.raku.org/language/101-basics#Stable_sort
https://docs.raku.org/routine/sort
https://docs.raku.org/routine/cmp
https://perl6advent.wordpress.com/2013/12/23/day-23-unary-sort/
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17