2

I have several directories with files which mostly have the same name but might be different and I need to have all these directories merged into a new one. I need to be able to compare the same named files and if the same ignore/overwrite or if different move/rename the older one by appending the modification date/time to the filename.

To add more details and to address @Serhat Cevikel and @Wildcard: Data is stored on two drives with similar structure and it is somehow complicated since there are sub-folders which need to be taken into consideration. Here is a tree of the testing environment I have created and some comments:

/bmrlbackup/drive1/
`-- user001
    `-- directory1
        `-- project001
            |-- file000           #identical
            |-- file001           #older same name
            |-- file0011          #unique
            |-- phase1
            |   |-- file000       #identical
            |   |-- file110       #unique
            |   |-- file999       #newer same name
            |   `-- phase11
            |       `-- file111   #unique
            `-- phase2
                `-- file120       #unique
/bmrlbackup/drive2/
`-- user002
    `-- directory2
        `-- project001
            |-- file000           #identical
            |-- file001           #newer same name
            |-- file0012          #unique
            |-- phase1
            |   |-- file000       #identical
            |   |-- file210       #unique
            |   `-- file999       #older same name
            `-- phase2
                |-- file220       #unique
                `-- phase21
                    `-- file221   #unique

The output for the first rsync:

#rsync -a --ignore-existing --remove-source-files $sd1/ $dd1/
project001/
project001/file0011
project001/phase1/
project001/phase1/file110
project001/phase1/phase11/
project001/phase1/phase11/file111
project001/phase2/
project001/phase2/file120

Changed the remm (remaining "same" files) to list the sub-directories as well:

#remm=`ls -1 $(find $sd1/ -type f)`
/bmrlbackup/drive1/user001/directory1/project001/file000
/bmrlbackup/drive1/user001/directory1/project001/file001
/bmrlbackup/drive1/user001/directory1/project001/phase1/file000
/bmrlbackup/drive1/user001/directory1/project001/phase1/file999

Here, the two files:

/bmrlbackup/drive1/user001/directory1/project001/file000 
/bmrlbackup/drive1/user001/directory1/project001/phase1/file000

are the same in both locations and need not be copied or can be moved and overwrite the destination.

The same name different content files:

/bmrlbackup/drive1/user001/directory1/project001/file001
/bmrlbackup/drive1/user001/directory1/project001/phase1/file999

The "same name different content" files need to be compared and the older one needs to be renamed: appended with the modification date&time, so if source is newer then append the name of destination file and move the source, and if the source is older, then append the name of the source and move the name appended source.

The resultant of this process will eventually move all the files from drive1 to drive2.

Then everything errors for oldest=`find {$sd1,$dd1}....

Advice?

There are no more than 10000 files on each drive with sizes from 4k to 800M.

fptstl
  • 5
fpt
  • 21
  • There are different possible strategies; it potentially matters a lot roughly how many files and what sizes. Could you please edit your question to add more details? – Wildcard Dec 14 '16 at 21:24
  • Please don't post answers to clarify your question. [Edit] (or rewrite) your question instead. – terdon Jan 06 '17 at 18:46

1 Answers1

-1

If I'm not wrong, you mean, when filenames are different mv the file, when the filenames are the same, rename the older one appending the mod date/time and then mv. Here goes the script: First argument is the source path, second argument is the destination path. You shouldn't add a slash at the end of paths:

(Update: "ls" is replaced by "find", for two purposes: not to parse ls, and to sort files from multiple paths by date. Variable substition is made more concise as per se Wildcard and both whitespace and ":" are replaced by whitespaces)

#!/bin/bash

sd1=$1
dd1=$2


rsync -a --ignore-existing --remove-source-files $sd1/ $dd1/
remm=`ls $sd1`

for i in $remm
do
    oldest=`find {$sd1,$dd1} -type f -name $i -printf "%T@ %p\n" | sort -n | head -1 | cut -d " " -f2`
    appendd=`stat $oldest --printf=%y\n | sed 's/ +.*//g' | sed 's/[ :]/_/g'`
    newname="${oldest}_$appendd"
    mv $oldest $newname
done

rsync -a --ignore-existing --remove-source-files $sd1/ $dd1/
S.C
  • 487
  • Also, I can't imagine why you would use printf "%s_%s" $oldest $appendd in command substitution for a variable assignment in place of just "${oldest}_$appendd". – Wildcard Dec 14 '16 at 21:22
  • I made some changes, but the method of doing the renaming is still extremely fragile and liable to breakage. Probably should use find, but I don't have time just now to write it up. – Wildcard Dec 14 '16 at 21:31
  • Good comments. Hope this one works better. Tried on two test directories and worked well (finding the older files, renaming, etc) – S.C Dec 14 '16 at 22:07
  • It's laudable that you've tested it on some inputs, but it doesn't replace knowing what all the code is doing. This will still fail on whitespace or special characters in file names. Also see Why is looping over find's output bad practice? – Wildcard Dec 14 '16 at 22:13
  • An option may be to use printf: printf '%q\n' $sd1/* | sed 's/$sd1//g' and then use sed to replace backslashes introduced by printf: sed 's/\ / /g' | sed 's/\\n/\n/g' . However I could not use the result as an input to the name argument of find. Still working on that. You're input is invaluable. – S.C Dec 14 '16 at 23:00
  • Did you read through the four links I've provided? You're still making the errors described in all four of them.... I provided those links because they are extremely helpful. I hope you take the time to read them. – Wildcard Dec 14 '16 at 23:11
  • Working on inodes approach, reached a good point, but I think I can finish that tomorrow. I'll be waiting for your comment. And I'll go through the links. – S.C Dec 14 '16 at 23:44