4

I have a directory full of pdf files of journal articles, most of which are named by their bibtex key. Some time ago I made a backup on an external hard drive, but I haven't kept it up to date and there are tons of duplicates with different names. I want to get the two directories back into sync and delete the extra files.

Using fdupes I have identified a bunch of these, and now I have a nice paired list of them. However, most of the duplicates on the external drive have meaningless names. I'd like to rename them to be the same as the duplicate in the first directory, rather than deleting them and copying them over again, because there are so many of them. So I don't want to just use rsync.

For example, if the fdupes output is:

/home/articles/bibtex.pdf
/external/articles/morearticles44.pdf

Is there a faster way than writing

mv /external/articles/morearticles44.pdf /external/articles/bibtex.pdf

for each pair of duplicates?

ManderW
  • 41

2 Answers2

1

In my experience fdupes can be inconsistent in the order that it outputs files (I have had my own problems using the --delete option). This should be fairly robust as it doesn't require the files to be in a specific order (as as long as there are always two dupes in different folders):

# note no trailing slash
source_dir=/home/articles
target_dir=/external/articles

fdupes "$target_dir" "$source_dir" |
  while IFS= read file; do
    case "$file" in
      "$source_dir/"*)
         source=${file##*/}
         ;;
      "$target_dir/"*)
         target=$file
         ;;
      '')
         if [ "$source" ] && [ "$target" ]; then
           echo mv -i "$target" "$target_dir/$source"
         fi
         unset source target
         ;;
    esac
  done

This will just print out the mv commands, remove the echo when you are sure you have what you want. Also the -i option for mv will prompt you if it is going to overwrite anything.

Graeme
  • 34,027
  • Excellent! Now, the only other problem -- how do I remove the lines where the filename is the same in both directories? – ManderW Apr 09 '14 at 19:45
  • @user3035900, I wouldn't bother doing anything about that. If you try to mv a file onto itself, mv will just print a message saying that they are the same file and then not do anything. There is no need to make any special provision. – Graeme Apr 09 '14 at 21:13
1

I'll propose a different workflow (suggested by hasenj): instead of using fdupes to identify duplicate files and perform some post-processing to remove them, you can use Unison to identify and deal with duplicates.

You need to run Unison with one of the roots remote, otherwise it doesn't detect identical files. So run

unison /home/articles/bibtex.pdf ssh://localhost/external/articles

Unison will churn for a while and propose to synchronize the two trees. Choose to synchronize in the > direction to move /external/articles/morearticles44.pdf to /external/articles/bibtex.pdf.