BASH: Group by name and keep the last

Question

I hope all is good for you.

I have some files like that:

my_file_210804_2056_856.csv_20210804170806
my_file_210804_2056_856.csv_20211119181137
my_file_210805_2056_857.csv_20210805200847
my_file_210805_2056_857.csv_20211119181137
      ...

I want to retrieve the last version of the file by a unix command.

For example for the file 210804 I want to retrieve only my_file_210804_2056_856.csv_20211119181137 because it's the last.

Thanks for your help

Assuming no newline or other special characters in the file name, perhaps ls -1 *210804* | tail -1? — doneal24, Jan 26 '22 at 02:29

cas · Answer 1 · 2022-01-26T02:42:35.500

Using GNU versions of find, sort, and head (to make use of the NUL character to separate the filenames - NUL is the only character which is not valid in a path/filename, so it is the only character which can safely be used as a filename separator):

find . -maxdepth 1 -type f -name 'my_file_210804*' -print0 | sort -z -r | head -z -n 1

This will work with any filenames, no matter what characters they contain (including spaces, newlines, etc).

If you are absolutely certain that the filenames don't and won't ever contain newline characters, you can use newlines as the separator - drop the -print0 from the find command, and the -z option from sort and head.

find . -maxdepth 1 -type f -name 'my_file_210804*' | sort -r | head -n 1

This variant is also useful if the filenames are in a plain text file, with one filename per line:

sort -r filename-list.txt | head -n 1

If you want to sort the filenames by the timestamps in the filesystem (rather than by dates & times embedded in the filenames), it's a little more complicated. You need to use -printf with a format string that includes the modification timestamp in seconds since the epoch (%T@), a tab (\t), the filename (%p) and a NUL (\0), rather than just -print0:

find . -maxdepth 1 -type f -name 'my_file_210804*' -printf '%T@\t%p\0' |
  sort -z -r -n -k 1,1 |
  cut -z -f2- |
  head -z -n 1

Here, sort ... -k1,1 is used to sort the output of find by the first field (the timestamp), then cut is used to remove the timestamp field and the tab character which separates it from the filename.

BTW, you may be tempted to parse the output of ls. Don't do that, it doesn't work.

NOTE: find will output filenames with the path (relative to the "starting point" directory). You can remove the paths with sed, or bash's built-in parameter expansion features (e.g. ${parameter#word} or ${parameter/pattern/string}), or with basename — cas, Jan 26 '22 at 02:40
@waltinator that's why I said "path/filename". find returns pathnames. — cas, Jan 26 '22 at 02:41

BASH: Group by name and keep the last

1 Answers1