4

I am working with thousands of files whose names contain sequential dates ranging from 2001-01-01 to 2020-12-31.

A sample of such files can be seen below:

gpm_original_20010101.nc
gpm_cressman_20010101_cor_method-add_fac-0.5_pass-1_radius-500km.nc
gpm_cressman_20010101_cor_method-add_fac-0.5_pass-2_radius-250km.nc
gpm_cressman_20010101_cor_method-add_fac-0.5_pass-3_radius-150km.nc
gpm_cressman_20010101_cor_method-add_fac-0.5_pass-4_radius-75km.nc
gpm_cressman_20010101_cor_method-add_fac-0.5_pass-5_radius-30km.nc
.
.
.
gpm_original_20010131.nc
gpm_cressman_20010131_cor_method-add_fac-0.5_pass-1_radius-500km.nc
gpm_cressman_20010131_cor_method-add_fac-0.5_pass-2_radius-250km.nc
gpm_cressman_20010131_cor_method-add_fac-0.5_pass-3_radius-150km.nc
gpm_cressman_20010131_cor_method-add_fac-0.5_pass-4_radius-75km.nc
gpm_cressman_20010131_cor_method-add_fac-0.5_pass-5_radius-30km.nc

and so on until 2020-12-31. What I need to do is to reorganize these files into new folders based on years and months.

The directory tree needs to follow the logic year with sub-directories months, like this:

2001
    01
    02
    03
    04
    05
    06
    07
    08
    09
    10
    11
    12

2002 01 02 03 04 05 06 07 08 09 10 11 12

and so on. And the files should be moved to these directories based on the equivalent date in their filenames. For example: all files containing 200101xx in their names should be moved to the 2001/01 folder.

What is the most straightforward way to achieve this using bash?

3 Answers3

6

Here is my proposal if I understood correctly:

for i in *.nc; do 
  [[ "$i" =~ _([0-9]{8})[_.] ]] && d="${BASH_REMATCH[1]}"
  mkdir -p "${d:0:4}/${d:4:2}"
  mv "$i" "${d:0:4}/${d:4:2}"
done
  • Thanks! This is very simple and works as intended. – thiagoveloso Jan 19 '21 at 14:41
  • @schrodigerscatcuriosity: why are you anchoring with the [_.]? I'm not saying it's wrong, just trying to understand. An elegant answer, BTW. – spuck Jan 19 '21 at 15:55
  • 1
    Thanks! OP' sample has two possibilities: XXXXXXXX_ and XXXXXXXX., so both have to be taken into account. Is that what you're asking? – schrodingerscatcuriosity Jan 19 '21 at 16:00
  • It just seemed that _([0-9]{8}) would be sufficient. I was wondering why to also match the underscore or period. – spuck Jan 20 '21 at 17:15
  • 1
    @spuck You're right, but the idea is to be as thorough as I can with regex patterns. Imagine there'a file with _123456789 in its name, it will match but it's not 8 digits, it has 9 digits. I know it seems not the case in OP's file names, but you never know. You could be go even further and verify that the pattern is indeed a date... but that would be more complicated stuff... and well, takes much more time :). – schrodingerscatcuriosity Jan 20 '21 at 17:30
5

Looping through years and months:

#!/bin/bash

for year in {2001..2020} ; do mkdir $year for month in {01..12} ; do mkdir $year/$month mv gpm_cressman_${year}${month}* $year/$month done done

In case you have too many files with long names per year & month (you claim "thousands"), bash might reach its limits ("argument list too long"). Either temporarily increase ulimit or use xargs:

#!/bin/bash

for year in {2001..2020} ; do mkdir $year for month in {01..12} ; do mkdir $year/$month find -maxdepth 1 -type f -name "gpm_cressman_${year}${month}*" | xargs -I '{}' mv '{}' $year/$month done done

FelixJN
  • 13,566
4

Assuming the date is always in the same position in the filename, put this in a script:

#!/bin/bash
#
while $# -gt 0 ; do
    file="$1"
    shift
    year="$( echo "$file" | cut -c 14-17)"
    mnth="$( echo "$file" | cut -c 18-19)"
    [[ -d $year/$mnth ]] || mkdir -p $year/$mnth
    echo mv "$file" $year/$mnth
done

And call the script with:

find . -maxdepth 1 -type f -name '*201*' -printf | \
    xargs -r the_script

Read man bash find xargs mkdir mv.

Remove the echo when you want to do it for real.

waltinator
  • 4,865