0

Given a large list of files, containing the following:

FILE1.doc
FILE1.pdf
FILE2.doc
FILE3.doc
FILE3.pdf
FILE4.doc

Is there a terminal command that would allow me to remove all files that do not have a duplicate name in the list? In this case...FILE2.doc and FILE4.doc?

Anthon
  • 79,293
antonpug
  • 111
  • 1
    How do you know that the first filename of that list is not FILE.doc\nFILE1.pdf? Any of the newlines can be part of a file name. – Anthon Jul 16 '15 at 06:10
  • At what point are you halting the uniqueness comparison? At the first dot? At the last dot? What if there is no dot in the filename? If you had file1.doc and FILE1.DOC is this a duplicate? – Chris Davies Jul 16 '15 at 07:18
  • If you have two files, file.doc, file.pdf and only one of them appears in your list, should both be deleted or just the one that is listed? – Chris Davies Jul 16 '15 at 07:21

3 Answers3

1

Using bash, this will remove all files that don't have another file with the same name but different extension:

for f in *; do same=("${f%.*}".*); [ "${#same[@]}" -eq 1 ] && rm "$f"; done

This approach is safe for all file names, even those with white space in their names.

How it works

  • for f in *; do

    This starts a loop over all files in the current directory.

  • same=("${f%.*}".*)

    This creates a bash array with the names of all files with the same basename.

    $f is the name of our file. ${f%.*} is the name of the file without its extension. If, for example, the file is FILE1.doc, then ${f%.*} is FILE1. "${f%.*}".* is all the files with the same basename but any extension. ("${f%.*}".*) is a bash array of those names. same=("${f%.*}".*) assigns the array to the variable same.

  • [ "${#same[@]}" -eq 1 ] && rm "$f"

    If there is only one file with this basename, we delete it.

    "${#same[@]}" is the number of files in the array same. [ "${#same[@]}" -eq 1 ] is true if there is only one such file.

    && is logical-and. It causes the statement which follows, rm "$f" to be executed only if the statement which precedes it returns logical true.

  • done

    This marks the end of the for loop.

John1024
  • 74,655
0

Suppose your list of files is in some file /tmp/files.list, e.g. after ls * > /tmp/files.list ;

Then sort -u /tmp/files.list gives you a sorted file lists without duplicate (not needed if you did the ls * > /tmp/files.list above). You could process that with some awk script inspired from this, e.g.

sort -u /tmp/files.list | awk '{
  function basename(file) {
    sub(".*/", "", file)
    return file
  }
curfil=$0;
if (basename(curfil)==basename(prevfil)) system("rm " + curfil);
prevfil=curfil;
}'

Beware, I have not tested this.

0

Another simple looking complex way can be:

for x in `for i in *; do echo $i ; done | cut -d'.' -f1 | uniq -u `; do rm $x.*; done
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
neuron
  • 1,976
  • Your code breaks with file names containing spaces and other special characters. See http://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters for some tips to fix that. – Gilles 'SO- stop being evil' Jul 16 '15 at 21:14