Say you have one file called foo.pdf
. How can I find out whether my machine contains another file, say bar.pdf
, that has the exact same content but just a different name?

- 141
4 Answers
Tell me if this does work (won't be fast):
find /home/user -type f -name "*.pdf" -exec md5sum {} + 2> /dev/null | uniq -f2 -D

- 12,396
-
guillermo chamorro You beat me to the answer. I was going to suggest the same thing. :D – Mark Stewart Feb 21 '20 at 20:30
-
Shouldn't it be
… | sort | uniq -w32 -D
to print the lines with the same hash? – Freddy Feb 21 '20 at 21:49
fdupes
sounds quite smart, but it does match all the files together. You could use a couple of the same techniques more optimally if you already have a single file that you want to match.
You could start by getting the file size of foo.pdf
, and constructing a find
command that matches the exact size only. That should be a cheap shortlist.
Then you could cut the first few bytes (a few hundred) from each of those files, and compare those bytes with cmp -s
. That should eliminate some more.
For files that are still possible duplicates, you can cksum
or md5sum
them.
You probably want to check the inode numbers are different from your original, in case you find a hard-linked copy.

- 8,679
-
1Instead of the first few bytes (which may be the same in many files because of boilerplate headers), start comparing from the middle of the file ;-) – Feb 22 '20 at 03:39
-
-
My bad: the inode numbers are only unique within a given file system (partition, which looks like a device). Although, I'm suggesting using inode to detect hard links, and IIRC hard links are not supported cross-device anyway. – Paul_Pedant Feb 22 '20 at 11:49
You could use fdupes
to search for duplicate files in different directories. The default is to list duplicate files as blocks separated by a blank line.
If both files are in one directory dir1
:
fdupes dir1
For a recursive search, add the -r
/ --recurse
option:
fdupes -r dir1
You can search multiple directories and set the recurse option for specific dirs:
fdupes dir1 dir2 --recurse: dir3

- 25,565
rmlint -r
rmlint is an extremely fast tool to find duplicates and optionally remove them if you want.
Features
Finds…
- …Duplicate Files and duplicate directories.
- …Nonstripped binaries (i.e. binaries with debug symbols)
- …Broken symbolic links.
- …Empty files and directories.
- …Files with broken user or/and group ID.
Differences to other duplicate finders:
- Extremely fast (no exaggeration, we promise!)
- Paranoia mode for those who do not trust hashsums.
- Many output formats.
- No interactivity.
- Search for files only newer than a certain mtime.
- Many ways to handle duplicates.
- Caching and replaying.
- btrfs support.
This tutorial will help and guide you gently ;)

- 1,490
md5sum
each file, the output should be the same. – schrodingerscatcuriosity Feb 21 '20 at 18:34bar.pdf
exists. – Paul Razvan Berg Feb 21 '20 at 18:50