I have a web application that access a remote storage running Linux to get some files, the problem is that the remote storage have currently 3 million files , so accessing the normal way is a bit tricky.
So I needed to work on a script that is going to make it a little bit more easy to use , this script is going to reorganize the files into multiple folders depending on their creation date and specially their names,i made the script and it worked just fine, it intended to do what it meant to do, but it was too slow, 12 hours to perform the work completely (12:13:48 to be precise)
.
I think that the slowness is coming from the multiple cut
and rev
calls I make.
example :
I get the file names with an ls
command that I loop into with for, and for each file I get the parent directory and, depending on the parent directory, I can get the correct year:
case "$parent" in
( "Type1" )
year=$(echo "$fichier" | rev | cut -d '_' -f 2 | rev );;
( "Type2" )
year=$(echo "$fichier" | rev | cut -d '_' -f 2 | rev);;
( "Type3" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
( "Type4" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
( "Type5" )
year=$(echo "$fichier" | rev | cut -d '_' -f 1 | rev | cut -c 1-4);;
esac
for type1 of files :
the file==>MY_AMAZING_FILE_THAT_IMADEIN_YEAR_TY.pdf
I need to get the year so I perform a reverse cut:
year=$(echo "$file" | rev | cut -d '_' -f 2 | rev );;
for type2 of files :
the file==>MY_AMAZING_FILE_THAT_IMADE_IN_YEAR_WITH_TY.pdf
etc...
and then I can mv
the file freely : mv $file /some/path/destination/$year/$parent
and yet this is the simplest example, there are some files that are much more complex, so to get 1 information I need to do 4 operations, 1 echo , 2rev and 1echo
.
While the script is running I am getting speeds of 50 files/sec to 100 files\s
, I got this info by doing a wc-l output.txt
of the script.
Is there anything I can do to make it faster? or another way to cut the files name? I know that I can use sed
or awk
or string operations but I did not really understand how.
$file
get its value? What does your current code look like? Please [edit] your question. – Kusalananda Oct 11 '17 at 13:41ls
is not a good idea. You can probably do everything with a singlefind
and Perlrename
. But again, you don't give enough information for a full answer. Good luck. – Satō Katsura Oct 11 '17 at 14:13find
, no problem. Are the years the only four-digit number in the filenames? – Kusalananda Oct 11 '17 at 14:19find
. I'm currently updating my answer. Are the years the only four-digit number in the filenames? – Kusalananda Oct 11 '17 at 14:21Type 1: FA_ERDXSER_CALSE_RASM_2017047361_YEAR_20170922.pdf
– Kingofkech Oct 11 '17 at 14:25Type 2: FILE_SENT_PAID_1998027890_YEARMMdd.pdf"
YEARMMdd
? Is that the actual filename? – Kusalananda Oct 11 '17 at 14:37ls
which means it will break on weird file names and then because this seems way too complicated. We can't really help you though since you don't clearly explain what you are trying to do with these files. – terdon Oct 11 '17 at 14:44