I have a folder of images that contain quite a bit of duplicates, I'd like to remove all duplicates except for one.
Upon Googling I found this clever script from this post that succinctly does almost what I want it to do:
#!/bin/sh -eu
find "${1:-.}" -type f ! -empty -print0 | xargs -0 md5 -r | \
awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}' | \
xargs -0 rm -v --
Unfortunately I am still fairly green when it comes to UNIX shell scripting so I'm not sure what the actual commands/flags for each piece are doing here so I am unable to modify it for my specific needs.
From my understanding:
find "${1:-.}" -type f ! -empty -print0
- searches the current directory for non-empty files and prints the file names. (not sure what the piece "${1:-.}"
means though)
| xargs -0 md5 -r
- Pipes the results above (via the xargs -0
command?) into the md5
command to get the md5 hash signature of each file (-r
reverses the output to make it a single line?)
awk '$1 in a{sub("^.{33}","");printf "%s\0",$0}a[$1]+=1{}'
- This is where I get lost..
$1 in a{sub("^.{33}","")
- takes the input up until the first whitespace character and replaces the first 33 characters from the start of the string with nothing (sub("^.{33}",""
)printf "%s\0"
- format prints the entire stringa{...,$0}
- I'm not sure what this doesa[$1]+=1{}
- Not sure either
xargs -0 rm -v --
- Pipes the results to the rm
command, printing each file name via -v
, but I'm not sure what the syntax --
is for.
When I run this, it outputs like this ./test3.jpg./test2.jpg./test.jpg: No such file or directory
so there must be a formatting issue.
My question is:
- Can this be modified to remove all files except 1?
- Can someone help explain the gaps in what the commands/syntax means as I've outlined above?
I'm sure this is probably easy for someone who knows UNIX well but unfortunately that person is not me. Thank you in advance!
For context: I'm running this in ZSH in macOS BigSur 11.
man
pages forman xargs
,man md5
, andman find
(for-print0
).${1:-.}
is explained here (bash
andzsh
use it the same way),--
here. Then try to make the question a bit more narrow. – FelixJN Dec 12 '21 at 22:00