0

I'd like to sort the following list of filenames / pathes.

L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE
L1_Data/level1/191027/LC08_L1TP_191027_20201221_20210310_01_T1 DONE
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE

Each line contains a filename (including the path) and its work status (QUEUED/DONE). Each filename contains information of satellite imagery data as satellite type, recording date, footprint and others.

Now, I'd like to reorder the list according to the following priorities:

  1. work status --> QUEUED at first. This, as single step, was not a problem for me, but the solutions for the next steps including their combination (you will find a closer description of my issues after the next image):
  2. satellite type (S2A=Sentinel A; S2B=Sentinel B; LC08=Landsat 8; LE07=Landsat 7) --> S2A/B at the beginning (no matter whether A or B), followed by LC08, then LE07. In other words: I'd like to distinguish between Sentinel 2, Landsat 8 and Landsat 7, but not between Sentinel 2A and Sentinel 2B.
  3. Recording date, ascending
  4. Footprint, ascending

The locations of the according substrings are shown in the following image, which is followed by a decription of my problems.

enter image description here

Apart from having only a very basic knowledge of the sort command, my particular problems are:

  • a) to adress correctly the substrings, within
  • b) two different file name types (/conventions),
  • c) the underscore isn't usefull as separator, because in the Sentinel filenames there are five, in the Landsat six underscores, and beyond that the the substring sequence differs between both.
  • d) the order S2A/B before LC08 before LE07 is unfortunately not according to the alphabet, and
  • e) to adress the the S2A and S2B satellite as one unit. This of course could be solved by adressing only S2, but, as consisting out of just two characters, with a certain risk of confusion with other parts of the whole filename string (Actually the list is much longer and gets updated from time to time, and so could contain 'false' S2s in other or future lines).

In the end, the reordered list should look like this:

L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE

May anybody help me, please?

jaysigg
  • 111
  • 2
    Welcome to the site. Please edit your post to indicate what you already tried and where you ran into problems. That way contributors can understand what tools you have available/are familiar with, and you can avoid receiving proposed solutions that you already know won't work. – AdminBee Jul 07 '21 at 10:50
  • Normally, I do indicate what I tried before. But in this case my intentions exceed my very basic knowledge at several points, especially what concerns the aspect to bring the potential solutions for each single problem together. Therefore, indicating my aproaches would end in a non-comprehensive patchwork. What I do not understand is why you miss a description, where I 'run into problems'. My problems are listed in the middle part of the post (points a to e)!? – jaysigg Jul 07 '21 at 12:12
  • What I meant with "where you run into problems" refers to the request to show what you already tried. Usually, contributors on this site expect askers to present a concrete question where they need help in solving an error condition, or where an approach expected to work turned out not to (e.g. "I used this command/script to do task XX, but instead of the expected output I got this"). Your points (a) to (e) are challenges to overcome while developing the script, but not "problems" in the sense of a script not doing what it is supposed to do. – AdminBee Jul 07 '21 at 14:01

2 Answers2

4

The problem is that the sort fields aren't in the same columns in the line.

I'm going with perl here for maximum flexibility: this is "custom_sort.pl"

#! perl

while (<>) { # capture the fields of an "L" satellite if (/./(L...)_.?(\d+)(\d+)\S+\s+(.)/) { push @data, [$_, $4, $1, $3, $2] } # capture the fields of an "S" satellite elsif (/./(S..).*?(\d{8}).?_.?.*?(.?)_\S+\s+(.)/) { push @data, [$_, $4, $1, $2, $3] } }

sub mysort { -($a->[1] cmp $b->[1]) # work status, descending || cmp_satellite($a->[2], $b->[2]) # satellite || $a->[3] <=> $b->[3] # record date || $a->[4] cmp $b->[4] # footprint } sub cmp_satellite { my ($a, $b) = @_; return -1 if $a =~ /^S/; return +1 if $b =~ /^S/; $a cmp $b }

print $_->[0] for sort mysort @data

Run it with

perl custom_sort.pl file
glenn jackman
  • 85,964
1

Using awk, sort and cut:

awk -F'[/ ]' -v OFS='\t' '
{
  status=$NF # this is the last field

split($(NF-1), parts, "_") # split filename into array parts

if (parts[1]=="S2A" || parts[1]=="S2B") type=1 else if (parts[1]=="LC08"){ type=2 } else if (parts[1]=="LE07"){ type=3 } else { print "error, got unknown type " parts[1]; exit 1 }

date=(type==1 ? substr(parts[3], 1, 8) : parts[4]) footprint=(type==1 ? parts[6] : parts[3])

print status, type, date, footprint, $0 } ' file | sort -k1,1r -k2,2n -k3,3 -k4,4 | cut -f5-

The idea is to extract work status, satellite type, record date and footprint from each record and save them in four variables, the type is replaced by a number to define an custom order.

Then print those four variables tab-separated and suffixed by its original record, sort the output as desired and remove the first four fields afterwards with cut.

Output:

L1_Data/level1/T32TQR/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQR_20200101T110654.SAFE QUEUED
L1_Data/level1/T32TQR/S2B_MSIL1C_20200104T101319_N0208_R022_T32TQR_20200104T121239.SAFE QUEUED
L1_Data/level1/192027/LC08_L1TP_192027_20201212_20210313_01_T1 QUEUED
L1_Data/level1/T32TQS/S2B_MSIL1C_20200101T100319_N0208_R122_T32TQS_20200101T110654.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUL_20200101T110654.SAFE DONE
L1_Data/level1/T33TUM/S2B_MSIL1C_20200101T100319_N0208_R122_T33TUM_20200101T110654.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200102T102421_N0208_R065_T32TQS_20200102T105534.SAFE DONE
L1_Data/level1/T33TUL/S2B_MSIL1C_20200104T101319_N0208_R022_T33TUL_20200104T121239.SAFE DONE
L1_Data/level1/T32TQS/S2A_MSIL1C_20200106T100401_N0208_R122_T32TQS_20200106T103423.SAFE DONE
L1_Data/level1/192027/LC08_L1TP_192027_20201126_20210316_01_T1 DONE
L1_Data/level1/192028/LC08_L1TP_192028_20201126_20210316_01_T1 DONE
L1_Data/level1/192029/LC08_L1TP_192029_20201126_20210316_01_T1 DONE
L1_Data/level1/191027/LC08_L1TP_191027_20201221_20210310_01_T1 DONE
L1_Data/level1/191027/LE07_L1TP_191027_20201127_20201223_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201127_20201223_01_T1 DONE
L1_Data/level1/191028/LE07_L1TP_191028_20201213_20210108_01_T1 DONE
L1_Data/level1/191029/LE07_L1TP_191029_20201213_20210108_01_T1 DONE
Freddy
  • 25,565
  • Thank you for your efforts. I'm sure both approaches would work fine. But, as I 'feared' to come into the next struggle with a new language as perl would be for me, I preferred to work on with freddy's solution, at least for the moment. I hope you unterstand, @glenn. – jaysigg Jul 08 '21 at 07:43
  • _@Freddy: I tried your solution with the complete file, - sorting works perfectly. The remaining questions is now, how to integrate the creation of a new file or updating the existing file, respectively, with your code. (More just for fun I tried with -o file in the sort-section, but as expected - obviously because of the missing cut-process - this ended up in useless stuff). Would you - or anybody else - have an idea for me? – jaysigg Jul 08 '21 at 08:27
  • 1
    To save the output to a new file, use redirection: awk ... | cut -f5- >newfile – Freddy Jul 08 '21 at 20:12
  • Works perfect, with one exception: What if I just want to overwrite the input file? I tried to use the input filename/path as the output filename/path. But, in contrary to just using the sort command with -o, this led to an empty file!? Thanks again, you both. – jaysigg Jul 09 '21 at 09:52
  • 1
    Have a look at unix.stackexchange.com/questions/5821. You could overwrite the original file aftwards with awk ... | cut -f5- >newfile && mv -f newfile file) (the first cut example) or use sponge. – Freddy Jul 09 '21 at 13:30