3

I've moved a lot of files and folders from my old nas to my new smb share. The problem is that someone made entire copies of nearly every folder on that thing. And I need the space.

I'm currently running fdupes in a screen and outputting everything to a file. But this will give me a huge list of duplicate files, which I have to filter into duplicate folders.

Are there any existing utilities which can find matching folders? Or can someone suggest a shell script that might do the job?

I'm running Ubuntu 14.04

thomas_d_j
  • 1,501
blipman17
  • 141
  • What is a main folder? Is that a direct parent folder for the duplicates? Or some folder higher up in the hierarchy somewhere above the duplicates like /? What does 'there' (in "there you have to specify" to? – Anthon May 28 '16 at 12:55
  • the folder is /home/user/sambadata – blipman17 May 28 '16 at 13:17
  • fdupes will give you a list of duplicate files, separated into paragraphs (one paragraph per set of dupes). Only YOU can decide which of those are "original" files that need to be kept, and which are duplicates to be deleted. BTW, this is probably obvious, but if the person who made the copies put them all under the one directory (and that directory doesn't have any important non-dupe files), just delete that directory. – cas May 29 '16 at 08:20
  • yeah, but the problem is that that person did that about every month for a few years long. It's pretty ugly. – blipman17 May 29 '16 at 09:28
  • FYI: http://askubuntu.com/questions/3865/how-to-find-and-delete-duplicate-files and this may be of interest http://unix.stackexchange.com/questions/3037/is-there-an-easy-way-to-replace-duplicate-files-with-hardlinks (but take note of the discussionss about some of the drawbacks of replacing with hard links) – cas May 29 '16 at 12:08
  • that's file based. I have used fdupes-r -o output.txt right now, and the resultis that I have a file containing the location of 1.06 million duplicate files. That's too much for me to find the folders themselves to manually remove one of the duplicates. – blipman17 May 29 '16 at 17:26
  • 3
    rmlint has an option to find duplicate dirs. You will need to follow the installation instructions to compile from source since it's not (yet) in the ubuntu/debian repositories. Then run rmlint --types=dupedirs --progress <path>.... It will generate a shell script rmlint.sh which you can then inspect and/or run to delete the duplicate dirs. – thomas_d_j Jun 01 '16 at 23:57
  • thomas, you're a hero! If this works it'll be a miracle – blipman17 Jun 03 '16 at 21:43

0 Answers0