How to print differently named files sorted by one part of their name?

Question

I have a lot of files named like this:

n2+_PiU_w4_5348757.out
n2+_PiU_w2_5348755.out
n2+_PiU_w1_5348742.out
n2+_PiU_w1_5348729.out
n2+_PiU_w1_5348696.out
n2+_PiU_st3_w3_part6_5630814.out
n2+_PiU_st3_w3_part6_5630721.out
n2+_PiU_st3_w3_part5_5630720.out
n2+_PiU_st3_w3_part4_5630813.out

The point is, their names can be completely different and I need to sort them by the number before .out, i.e. by their ID.

I had a look on some similar questions (Sort based on the third column, Linux sort last column), but I'm not able to used sed or awk for my needs.

Would you, please, provide some way to sort them? Preferably using bash.

Based on the fact that you've accepted Roman's answer it looks like you just wanted to process a text file (its content doesn't really matter). As you can see, the other answer assumes you wanted to process file names (via shell glob). Next time please be more explicit. And btw, bash is not a text editor. — don_crissti, Mar 03 '18 at 21:05
@don_crissti I wanted to process file names - I didn't know how to sort them, but I'm able to list filenames and send them via pipe. — Eenoku, Mar 03 '18 at 22:17

steeldriver · Answer 1 · 2018-03-03T19:30:06.263

With recent (> 4.0) GNU awk, using an associative array keyed on the numeric (second-to-last) field:

printf '%s\0' * | gawk '
  BEGIN {
    RS="\000"; FS="[_.]"; 
    PROCINFO["sorted_in"]="@ind_num_asc"
  } 
  {
    a[$(NF-1)]=$0
  } 
  END {
    for (k in a) print a[k]
}'

ex.

printf '%s\0' * | gawk 'BEGIN{RS="\000"; FS="[_.]"; PROCINFO["sorted_in"]="@ind_num_asc"} {a[$(NF-1)]=$0} END {for (k in a) print a[k]}' 
n2+_PiU_w1_5348696.out
n2+_PiU_w1_5348729.out
n2+_PiU_w1_5348742.out
n2+_PiU_w2_5348755.out
n2+_PiU_w4_5348757.out
n2+_PiU_st3_w3_part5_5630720.out
n2+_PiU_st3_w3_part6_5630721.out
n2+_PiU_st3_w3_part4_5630813.out
n2+_PiU_st3_w3_part6_5630814.out

Similarly, using a perl hash:

printf '%s\0' * | perl -F'[_.]' -0ne '
  $h{$F[$#F-1]} = $_ }{ for $k (sort { $a <=> $b } keys %h) {print "$h{$k}\n"}
'

score 1 · Answer 2 · answered Mar 03 '18 at 21:16

With zsh globs:

$ printf '%s\n' *_<->.out(noe'(REPLY=${REPLY##*_})')
n2+_PiU_w1_5348696.out
n2+_PiU_w1_5348729.out
n2+_PiU_w1_5348742.out
n2+_PiU_w2_5348755.out
n2+_PiU_w4_5348757.out
n2+_PiU_st3_w3_part5_5630720.out
n2+_PiU_st3_w3_part6_5630721.out
n2+_PiU_st3_w3_part4_5630813.out
n2+_PiU_st3_w3_part6_5630814.out

<->: any sequence of digit (<x-y> with no bound)
(...): glob qualifier
n: numerical order
oe'(code)': order based on the evaluation of code:
REPLY=${REPLY##*_}: the sort key is the part after the last _

RomanPerekhrest · Accepted Answer · 2018-03-03T19:18:12.570

0

awk + sort + cut combination:

awk -F'_' '{ $0=$NF OFS $0 }1' files_list.txt | sort | cut -d' ' -f2-

-F'_' - field separator
$NF - last field (e.g. 5348696.out)
$0=$NF OFS $0 - prepend the current record $0 with the last field $NF value for further straightforward sorting (e.g. 5348757.out n2+_PiU_w4_5348757.out)
cut -d' ' -f2- - filtering fields starting from the 2nd

The output:

n2+_PiU_w1_5348696.out
n2+_PiU_w1_5348729.out
n2+_PiU_w1_5348742.out
n2+_PiU_w2_5348755.out
n2+_PiU_w4_5348757.out
n2+_PiU_st3_w3_part5_5630720.out
n2+_PiU_st3_w3_part6_5630721.out
n2+_PiU_st3_w3_part4_5630813.out
n2+_PiU_st3_w3_part6_5630814.out

edited Mar 03 '18 at 19:18

answered Mar 03 '18 at 19:01

RomanPerekhrest

30,212

Thank you very much! Would you mind explain this script a little? – Eenoku Mar 03 '18 at 19:06
1

@Eenoku, welcome, see my expalantion – RomanPerekhrest Mar 03 '18 at 19:18

How to print differently named files sorted by one part of their name?

3 Answers3