2

I have a series of files with names of the form "foo.date.bar" where "date" is a six digit field such as 310715.

So for example

foo.310715.bar
foo.260815.bar
foo.110815.bar
foo.040815.bar

I would like to sort these into date order, based on the date in the filename not any file metadata, so that my script can delete some of them. Normally I would do this in Python or PHP where it would be easy, but I'm trying to learn how to do things in Bash. I made a first attempt with the command

for f in $( find $dir -type f | sort -r -t. -k 2 ); do
    echo $f
done

but then I realised sorting the second column either numerically or alphabetically is no use, I have to sort it as a date. There doesn't seem any way to tell sort how to treat the six-digit field as a date, or how to treat it as three 2-digit columns. I wondered if the next step would be to use sed or tr or suchlike to turn the six-digit field into something sort can parse?

Thanks in advance for any assistance,

MB


Thanks to everyone for your excellent answers, I've learned a lot from reading them.

  • 1
    you will need to cut out the date portion of every file and reconstruct it in yymmdd format for each file and then sort them numerically. Which means you will need to rename files, unless you want to create a lookup table containing new date format vs file names and then sort on the new field and delete the corresponding file names. Bash is not as capable as other, more modern, scripting languages, like perl or python unfortunately – MelBurslan Mar 31 '16 at 17:20
  • 1
    you've now learnt why you should always use YYYYMMDD (or at least YYMMDD) rather than any other date format. it's the ONLY one that sorts correctly. – cas Mar 31 '16 at 23:48

3 Answers3

1

Here's an abuse of bash arrays; it splits the timestamp apart and creates array entries based on the YYMMDD order, then prints the array back out in order.

declare -a array
for file in foo.*.bar
do
  [[ $file =~ foo.([[:digit:]]{2})([[:digit:]]{2})([[:digit:]]{2}).bar ]] && \
    {
      index="${BASH_REMATCH[3]}${BASH_REMATCH[2]}${BASH_REMATCH[1]}"
      array[$index]="$file"
    }
done

for index in "${array[@]}"
do
  echo $index
done

# or
printf "%s\n" ${array[@]}
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • This is very helpful @Jeff Schaller - why do you say it is an abuse of bash arrays? – Monkeybrain Apr 05 '16 at 17:07
  • 1
    I suppose it's a valid use; it's just a little unusual to index an array starting in the hundred-thousands, but you made it a point to ask about doing it in bash, and I'm trying to get better at using various shell features, so I couldn't resist. No actual arrays were harmed in the making of this answer :) – Jeff Schaller Apr 05 '16 at 17:25
1

If you have GNU or FreeBSD sort, you can use the -V or --version-sort option, after first using sed to swap the date format (and then sed again to change the date format back):

ls -1 | 
    sed -E -e 's/^(.*\.)(..)(..)(..)(.*)$/\1\4\3\2\5/' | 
    sort -V | 
    sed -E -e 's/^(.*\.)(..)(..)(..)(.*)$/\1\4\3\2\5/'

Ideally, you should just rename the files so that they have a useful date format. e.g. using the perl rename utility prename:

$ prename -v 's/^(.*\.)(..)(..)(..)(.*)$/$1$4$3$2$5/' *
foo.040815.bar renamed as foo.150804.bar
foo.110815.bar renamed as foo.150811.bar
foo.260815.bar renamed as foo.150826.bar
foo.310715.bar renamed as foo.150731.bar
$ ls -1 | sort -V
foo.150731.bar
foo.150804.bar
foo.150811.bar
foo.150826.bar

(BTW, unlike most prename operations, this one happens to be reversible. If you need to, you can just run it again to rename them back to what they were)

cas
  • 78,579
  • Useful if you need to support older shells; the array trick above doesn't work in default bash installed on mac os. – Ajax Jul 20 '21 at 04:57
0

The following piped sequence uses sed to first change file names that are in the format *.DDMMYY.* to the format *|DD|MM|YY|*. The reformatted output is piped to sort where the '|' is used as the field separator and sorted first by YY (-k4n), then by MM (-k3n), and finally by DD (-k2n). Then, the sorted output is piped back into sed where the filename is transformed back to the original format *.DDMMYY.*.

sed 's/\.\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\./|\1|\2|\3|/' | \
sort -t'|' -k4n -k3n -k2n | \
sed 's/|\([[:digit:]]\{2\}\)|\([[:digit:]]\{2\}\)|\([[:digit:]]\{2\}\)|/.\1\2\3./'

Using the following sample of files:

$ ls *bar -1
abc.291015.bar
abc.291115.bar
abc.291215.bar
abc.301215.bar
foo.040815.bar
foo.150115.bar
foo.150914.bar
foo.260815.bar
foo.301216.bar
foo.310715.bar
xyz.010113.bar

The sequence will produce the following:

xyz.010113.bar
foo.150914.bar
foo.150115.bar
foo.310715.bar
foo.040815.bar
foo.260815.bar
abc.291015.bar
abc.291115.bar
abc.291215.bar
abc.301215.bar
foo.301216.bar
zhdason
  • 55