0

I have 20,000 photos, half of those photos are duplicates, how can I delete Meaning How can I delete pictures that contain duplicate content

Notme
  • 83
  • 1
    are you talking about exact-byte-by-byte duplicates? – A.B Aug 08 '20 at 10:24
  • Yes, I want to delete the duplicate by the file size, but by the byte – Notme Aug 08 '20 at 10:44
  • Here's the idea: the key point is to do an hash (eg: sha256sum) of each file contents and then work on sorting the hashes to find duplicates, which is way faster than sorting the files' content directly. Some tools might already do part of this (eg: duff) – A.B Aug 08 '20 at 11:09
  • thank you my friend You have deleted the duplicate files file using duff -re . | xargs rm -r Ref link – Notme Aug 08 '20 at 11:39
  • Nice, duff appears to be made for this. I saw the command but never used it. You should post this as an answer to your own question. But don't use -r on rm, it's not needed and dangerous: use -- instead. You'd probably have to rework the xargs command for extra safety unless you are sure there are no spaces in your file names. – A.B Aug 08 '20 at 11:42
  • It sounds like a good reminder Thank you – Notme Aug 08 '20 at 12:12
  • 2
    There are literally hundreds of similar questions on this site. Look for fdupes for instance. – Stéphane Chazelas Aug 08 '20 at 13:20

1 Answers1

1
duff -re . | xargs -d '\n' rm

The Commando prints duplicate files only duff -re .

Xargs takes data from stdin It executes the command supplied as an argument to rm

ref link

Thank A.B

Notme
  • 83
  • I really meant -- not --r (which will be taken as a non-existing file anyway) . You don't want rm to be recursive, duff was already recursive – A.B Aug 08 '20 at 12:17
  • I have modified it. Seems I have read it incorrectly above – Notme Aug 08 '20 at 12:28
  • Can probably work because duff was called with .. Else -- that you still didn't put would be needed for safety. – A.B Aug 08 '20 at 12:32