Delete all files that DON'T have duplicate names?

Question

Given a large list of files, containing the following:

FILE1.doc
FILE1.pdf
FILE2.doc
FILE3.doc
FILE3.pdf
FILE4.doc

Is there a terminal command that would allow me to remove all files that do not have a duplicate name in the list? In this case...FILE2.doc and FILE4.doc?

How do you know that the first filename of that list is not FILE.doc\nFILE1.pdf? Any of the newlines can be part of a file name. — Anthon, Jul 16 '15 at 06:10
At what point are you halting the uniqueness comparison? At the first dot? At the last dot? What if there is no dot in the filename? If you had file1.doc and FILE1.DOC is this a duplicate? — Chris Davies, Jul 16 '15 at 07:18
If you have two files, file.doc, file.pdf and only one of them appears in your list, should both be deleted or just the one that is listed? — Chris Davies, Jul 16 '15 at 07:21

John1024 · Accepted Answer · 2015-07-16T06:31:44.033

Using bash, this will remove all files that don't have another file with the same name but different extension:

for f in *; do same=("${f%.*}".*); [ "${#same[@]}" -eq 1 ] && rm "$f"; done

This approach is safe for all file names, even those with white space in their names.

How it works

for f in *; do

This starts a loop over all files in the current directory.
same=("${f%.*}".*)

This creates a bash array with the names of all files with the same basename.

$f is the name of our file. ${f%.*} is the name of the file without its extension. If, for example, the file is FILE1.doc, then ${f%.*} is FILE1. "${f%.*}".* is all the files with the same basename but any extension. ("${f%.*}".*) is a bash array of those names. same=("${f%.*}".*) assigns the array to the variable same.
[ "${#same[@]}" -eq 1 ] && rm "$f"

If there is only one file with this basename, we delete it.

"${#same[@]}" is the number of files in the array same. [ "${#same[@]}" -eq 1 ] is true if there is only one such file.

&& is logical-and. It causes the statement which follows, rm "$f" to be executed only if the statement which precedes it returns logical true.
done

This marks the end of the for loop.

Thank you for a very descriptive answer. This is awesome. – antonpug Jul 16 '15 at 13:25 — antonpug, Jul 16 '15 at 13:25

score 0 · Answer 2 · edited Apr 13 '17 at 12:36

Suppose your list of files is in some file /tmp/files.list, e.g. after ls * > /tmp/files.list ;

Then sort -u /tmp/files.list gives you a sorted file lists without duplicate (not needed if you did the ls * > /tmp/files.list above). You could process that with some awk script inspired from this, e.g.

sort -u /tmp/files.list | awk '{
  function basename(file) {
    sub(".*/", "", file)
    return file
  }
curfil=$0;
if (basename(curfil)==basename(prevfil)) system("rm " + curfil);
prevfil=curfil;
}'

Beware, I have not tested this.

score 0 · Answer 3 · edited Dec 26 '18 at 22:54

0

Another simple looking complex way can be:

for x in `for i in *; do echo $i ; done | cut -d'.' -f1 | uniq -u `; do rm $x.*; done

edited Dec 26 '18 at 22:54

Rui F Ribeiro

56,709
26
150
232

answered Jul 16 '15 at 06:10

neuron

1,976

Your code breaks with file names containing spaces and other special characters. See http://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters for some tips to fix that. – Gilles 'SO- stop being evil' Jul 16 '15 at 21:14

Delete all files that DON'T have duplicate names?

3 Answers3

How it works