How to sort files by part of the filename?

Question

Given the files below:

ABC38388.SC01.StatueGrade_MKP
ABC38388.SC02.Statue_GKP
DEF38389.SC03.Statue_HKP
XYZ38390.SC00.Statue_WKP

How can I list them all based on the SC value, like this:

XYZ38390.SC00.Statue_WKP
ABC38388.SC01.StatueGrade_MKP
ABC38388.SC02.Statue_GKP
DEF38389.SC03.Statue_HKP

score 17 · Answer 1 · edited Dec 09 '15 at 17:02

17

In this particular case where your file names don't contain any whitespace or other strange characters, you can use ls and pipe it through sort:

$ ls -d -- *.SC* | sort -t. -k2
XYZ38390.SC00.Statue_WKP
ABC38388.SC01.StatueGrade_MKP
ABC38388.SC02.Statue_GKP
DEF38389.SC03.Statue_HKP

The -t sets the field delimiter and the -k2 tells sort to sort based on the part of the line starting with the 2^nd field (use -k2,2 for second field only).

For more complex cases, you could print each file name followed by the NULL character (\0), then pipe to GNU sort using its -z option to tell it to read NULL-delimited lines and, finally, use tr to change the \0 back to \n:

printf '%s\0' *SC* | sort -zt. -k2 | tr '\0' '\n'

edited Dec 09 '15 at 17:02

Stéphane Chazelas

544,893

answered Dec 09 '15 at 16:26

terdon

242,166

Thanks all... "-k" parameter in sort is confusing for me... on what basis did it pick up the field is confusing me... like k2 for second field ? – MRKR Dec 09 '15 at 17:45
1

@MRKR the -t. tells it to use . as the field separator so the -k2 refers to the second field as defined by .. – terdon Dec 09 '15 at 17:48
@MRKR If one of the answers here solved your issue, please take a moment and accept it by clicking on the check mark to the left. That will mark the question as answered and is the way thanks are expressed on the Stack Exchange sites. – terdon Dec 10 '15 at 12:34

Stéphane Chazelas · Answer 2 · 2015-12-09T17:07:00.960

With zsh, you can define your own sorting order for globs with the oe or o+ glob qualifiers:

ls -lUd -- *(oe['REPLY=${REPLY#*.SC}'])

or:

bysc() REPLY=${REPLY#*.SC}
ls -lUd -- *(o+bysc)

The sorting function receives the filename in $REPLY and is meant to return a string in $REPLY that globbing will sort on. Here, we return the part of the file name to the right of the first occurrence of .SC (or the full filename if it doesn't contain .SC).

score 1 · Answer 3 · edited Dec 09 '15 at 17:05

On a GNU system and with zsh or bash as your shell, use this:

find -maxdepth 1 -type f -print0 | sort -z -t. -k3 | \
while IFS="" read -r -d "" f; do
  basename "$f"
done

find searches for the files in the current directory (-maxdepth 1) and prints them null-byte delimited (-print0).
sort reads its input null-byte delimited (-z) and sorts on the part of the record that starts on the 3rd field (-k3) separated by a dot (-t.).
while reads the input
- and basename prints its name without path

score 0 · Answer 4 · answered Dec 09 '15 at 19:33

I would - as I often do - suggest perl.

perl has a sort function that lets you specify a comparison function. This comparison function is any test that takes two values, and returns -1, 0 or 1 depending on relative position.

It iterates the list setting each value as $a and $b and 'doing the test' for each element.

So by default:

$a cmp $b

for stringwise comparison, or sort { $a <=> $b } for numeric.

But as a result, you can apply arbitrarily complex custom sort criteria:

#!/usr/bin/perl
use strict;
use warnings;

sub sort_by_sc {
   my ( $a_sc ) = $a =~ m/SC(\d+)/;
   my ( $b_sc ) = $b =~ m/SC(\d+)/;
   return $a_sc <=> $b_sc;
}


my @file_list = qw ( 
    ABC38388.SC01.StatueGrade_MKP
    ABC38388.SC02.Statue_GKP
    DEF38389.SC03.Statue_HKP
    XYZ38390.SC00.Statue_WKP
);

print sort sort_by_sc @file_list;

Or reduced to a one liner, reading STDIN or a file (linefeed delimited, which is usually good enough):

perl -e 'print sort {@x = map {/SC(\d+)/}($a,$b); $x[0] <=> $x[1]} <>'

You could instead feed it the result of glob of a directory pattern instead:

perl -e 'print sort {@x = map {/SC(\d+)/}($a,$b); $x[0] <=> $x[1]} glob ( "*SC*")'

How to sort files by part of the filename?

4 Answers4

Linked