Using the perl
rename
utility:
Note: perl rename is also known as file-rename
, perl-rename
, or prename
. Not to be confused with the rename
utility from util-linux
which has completely different and incompatible capabilities and command-line options. perl rename is the default rename on Debian...IIRC, it's in the prename
package on Centos and the command should be executed as prename
rather than rename
.
$ rename -n 'if (m/(^\d{4}_\d\d_\d\d)_(\d\d)/) {
my ($date,$hour) = ($1,$2);
my $dir = "./$date/$hour/";
mkdir $date;
mkdir $dir;
s=^=$dir=
}' *
rename(2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz, ./2021_10_15/23/2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz)
rename(2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz, ./2021_11_24/21/2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz)
rename(2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz, ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz)
rename(2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz, ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz)
rename(2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz, ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz)
rename(2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz, ./2021_11_24/21/2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz)
rename(2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz, ./2021_11_25/20/2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz)
rename(2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz, ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz)
rename(2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz, ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz)
rename(2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz, ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz)
The -n
is a dry-run option, it will only show what it would do without actually doing it. Remove it (or replace with -v
for verbose output) when you're sure the rename script is going to do what you want.
The script works by first extracting the date and hour portions of each filename (skipping any filenames that don't match). Then it creates the directories for the date
and date/hour
, then renames the filename into those directories.
This assumes that the filenames are in the current directory. If they aren't, you'll have to adjust the m//
matching regex in the first line AND the s===
substitution regex in the second-last line.
Alternate version using the File::Path perl core module (which is included with perl), instead of using mkdir
twice (the make_path
function works like the mkdir -p
shell command):
$ rename -v 'BEGIN {use File::Path qw(make_path)};
if (m/(^\d{4}_\d\d_\d\d)_(\d\d)/) {
my $dir = "./$1/$2/";
make_path $dir;
s=^=$dir=
}' *
2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz renamed as ./2021_10_15/23/2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz
2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz renamed as ./2021_11_24/21/2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz
2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz renamed as ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz
2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz renamed as ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz
2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz renamed as ./2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz
2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz renamed as ./2021_11_24/21/2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz
2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz renamed as ./2021_11_25/20/2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz
2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz renamed as ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz
2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz renamed as ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz
2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz renamed as ./2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz
This isn't really any better than the first version, but it does demonstrate that you can use any perl code, any perl module to rename and/or move files.
Third version, this one uses File::Basename to split the input pathname into $path
and $file
portions. It can cope with filenames in the current directory, or in any other directory. File::Basename
is a core perl module, so is included with perl. It provides three useful functions, basename()
and dirname()
(which work similarly to the shell tools of the same name), and fileparse()
which is what I'm using in this script to extract both the basename and the directory into separate variables.
rename -n 'BEGIN {use File::Path qw(make_path); use File::Basename};
my ($file, $path) = fileparse($_);
if ($file =~ m/(\d{4}_\d\d_\d\d)_(\d\d)/) {
my $dir = "$path/$1/$2";
make_path $dir;
$_ = "$dir/$file"
}' /home/cas/rename-test/*
rename(/home/cas/rename-test/2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz, /home/cas/rename-test/2021_10_15/23/2021_10_15_23_35_SIP_CDR_pid3894_ins2_thread_1_4718.csv.gz)
rename(/home/cas/rename-test/2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz, /home/cas/rename-test/2021_11_24/21/2021_11_24_21_15_Gi_pid25961_ins2_thread_1_6438.csv.gz)
rename(/home/cas/rename-test/2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz, /home/cas/rename-test/2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins2_thread_1_6485.csv.gz)
rename(/home/cas/rename-test/2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz, /home/cas/rename-test/2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins3_thread_2_6485.csv.gz)
rename(/home/cas/rename-test/2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz, /home/cas/rename-test/2021_11_24/21/2021_11_24_21_15_Gi_pid27095_ins4_thread_3_6485.csv.gz)
rename(/home/cas/rename-test/2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz, /home/cas/rename-test/2021_11_24/21/2021_11_24_21_15_Gi_pid681_ins5_thread_4_6457.csv.gz)
rename(/home/cas/rename-test/2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz, /home/cas/rename-test/2021_11_25/20/2021_11_25_20_55_Gi_pid29741_ins5_thread_4_7540.csv.gz)
rename(/home/cas/rename-test/2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz, /home/cas/rename-test/2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins3_thread_2_7489.csv.gz)
rename(/home/cas/rename-test/2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz, /home/cas/rename-test/2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins4_thread_3_7488.csv.gz)
rename(/home/cas/rename-test/2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz, /home/cas/rename-test/2021_11_25/20/2021_11_25_20_55_Gi_pid30842_ins5_thread_4_7489.csv.gz)
BTW, it would be trivial to modify this so that it moved the files to a completely different path - just make it do something like my $dir = "/my/new/path/$1/$2";
instead of my $dir = "$path/$1/$2";
The key thing to understand about how the perl rename
utility works is that iff the rename script modifies the $_
variable then rename will attempt to rename the file to the new value of $_
. If $_
is unchanged, it will not try to rename it. This is why you can use any perl code to rename files - has to do is change $_
. Most often you'll probably use very simple sed
-like rename scripts (e.g. rename 's/ +/_/g' *
to rename spaces in filenames to an underscore) but the rename algorithm can be as complex as you need it to be.
$_
is a very important variable in perl - it's used as the default variable to hold input from file handles and iterators for loops if the programmer doesn't specify one. It's also used as the default operand for several operators (like m//
, s///
, tr///
) and as the default argument for many (but not all) functions. See man perlvar
and search for $_
(you'll need to escape that in less as \$_
).
BTW, one thing I didn't mention about rename
earlier is that it can take filenames either as arguments on the command line or from stdin. It defaults to newline-separated input from stdin (so it won't work with filenames that contain newlines - an annoying but completely valid possibility). You can use the -0
argument to make it use NUL separated input instead of newline-separated...so, it can work with any filenames, taking input from anything that can generate a list of NUL-separated filenames (e.g. find ... -print0
, but it's probably better to just use find
's -exec ... {} +
option).
rename
will also refuse to rename a file over an existing file unless you use its -f
or --force
option.
^
) of the filename, which causesrename
to rename the file. Obviously, this won't work if the start of the "filename" is actually a path. To change it to cope with full pathnames as input, you'd have to either add the new subdirectory in between the existing path and the file's basename, or (easier) replace the entire path with a newly constructed path string. perl's File::Basename core module would help with this, it can easily split a pathname into dir and basename portions. – cas Apr 11 '22 at 09:23bash: /bin/find: Argument list too long
I'm running the following command for the third versionfind /home/cas/rename-test/ -type f rename -n 'BEGIN {use File::Path qw(make_path); use File::Basename};my ($file, $path) = fileparse($_);if ($file =~ m/(\d{4}_\d\d_\d\d)_(\d\d)/) {my $dir = "$path/$1/$2";make_path $dir;$_ = "$dir/$file"}' {} \;
Hope it works – nidooooz Apr 13 '22 at 03:39find
, you can either pipe the filenames intorename
(use-print0
with the find command, and-0
with the rename command for NUL-separated filenames), or you can use find's -exec option (-exec rename ..... {} +
). If you use+
with -exec, find will try to fit as many filenames as will fit into a max length command line, and will runrename
as many times as necessary to process all filenames. If you use-exec ... {} \;
instead of-exec ... {} +
, it will run rename once per filename. In none of these cases will you ever get an arg list too long error. – cas Apr 13 '22 at 05:14-exec
from your find command....I'm assuming that's a copy-paste error. – cas Apr 13 '22 at 05:15find /home/cas/rename-test/ -type f -exec rename -n 'BEGIN {use File::Path qw(make_path); use File::Basename};my ($file, $path) = fileparse($_);if ($file =~ m/(\d{4}_\d\d_\d\d)_(\d\d)/) {my $dir = "$path/$1/$2";make_path $dir;$_ = "$dir/$file"}' {} \;
I missed exec while typing :)..Thank you so much – nidooooz Apr 13 '22 at 07:16+
rather than\;
to terminate the-exec
. Runningrename
once per several thousand files is much faster than running it once per file (the exact number of files depends on how long each pathname is - Linux currently has a command line length limit,ARG_MAX
, of approx 2 million characters. There's a good summary of how it works in the answers to CP: max source files number arguments for copy utility) Running rename has startup overhead each time it's run, which adds up if you're doing it for lots of files. – cas Apr 14 '22 at 01:53cd
into the folder and runfind . -type f -exec prename -n 'if (m/(^\d{4}_\d\d_\d\d)_(\d\d)/) {my ($date,$hour) = ($1,$2);my $dir = "./$date/$hour/";mkdir $date;mkdir $dir;s=^=$dir=}' {} +
is not giving any result. When I runfind ./* -type f -exec prename -n 'if (m/(^\d{4}_\d\d_\d\d)_(\d\d)/) {my ($date,$hour) = ($1,$2);my $dir = "./$date/$hour/";mkdir $date;mkdir $dir;s=^=$dir=}' {} +
it givesbash: /bin/find: Argument list too long
. Can you please help? – nidooooz Apr 14 '22 at 02:53find ./*
instead offind .
with the second one. your shell will expand ./* to all files and dirs in the current dir. – cas Apr 14 '22 at 02:57find .
the command just runs for a while without giving any result.find . -type f -exec prename -n 'if (m/(^\d{4}_\d\d_\d\d)_(\d\d)/) {my ($date,$hour) = ($1,$2);my $dir = "./$date/$hour/";mkdir $date;mkdir $dir;s=^=$dir=}' {} +
– nidooooz Apr 14 '22 at 03:06./
. The regexp uses^
so it only matches files starting with the pattern, but the names from find start with./
. This can never match. Solution: remove^
from the regex. 3. use the third version, it will work with files in any dir, including current dir.^
doingfind . -type f -exec prename -n 'if (m/(\d{4}_\d\d_\d\d)_(\d\d)/) {my ($date,$hour) = ($1,$2);my $dir = "./$date/$hour/";mkdir $date;mkdir $dir;s=^=$dir=}' {} +
gives./2021_12_30_04_56_Diameter_CDR_pid5906_ins3_thread_2_19104.csv.gz -> ./2021_12_30/04/./2021_12_30_04_56_Diameter_CDR_pid5906_ins3_thread_2_19104.csv.gz
...the third version does work, I'm asking this so that it would help me better understand what's going on in the command :) Thanks again for your help – nidooooz Apr 14 '22 at 03:37