20

I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
lamcro
  • 893

8 Answers8

16

If you have GNU utilities (or at least a set that can deal with zero-terminated lines) available, another answer has a great method:

find . -maxdepth 1 -print0 | sort -z | uniq -diz

Note: the output will have zero-terminated strings; the tool you use to further process it should be able to handle that.

In the absence of tools that deal with zero-terminated lines, or if you want to make sure your code works in environments where such tools are not available, you need a small script:

#!/bin/sh
for f in *; do
  find . -maxdepth 1 -iname ./"$f" -exec echo \; | wc -l | while read count; do
    [ $count -gt 1 ] && echo $f
  done
done

What is this madness? See this answer for an explanation of the techniques that make this safe for crazy filenames.

Shawn J. Goff
  • 46,081
15

There are many complicated answers above, this seems simpler and quicker than all of them:

find . -maxdepth 1 | sort -f | uniq -di

If you want to find duplicate file names in subdirectories then you need to compare just the file name, not the whole path:

find . -maxdepth 2 -printf "%f\n" | sort -f | uniq -di

Edit: Shawn J. Goff has pointed out that this will fail if you have filenames with newline characters. If you're using GNU utilities, you can make these work too:

find . -maxdepth 1 -print0 | sort -fz | uniq -diz

The -print0 (for find) and -z option (for sort and uniq) cause them to work on NUL-terminated strings, instead of newline terminated strings. Since file names can not contain NUL, this works for all file names.

derobert
  • 109,670
Jamie Kitson
  • 713
  • 1
  • 6
  • 14
  • 1
    But see my comment on Shawn J. Goff's answer, you can add the -print0 option to find, and the -z option to uniq and sort. Also, you want -f on sort as well. Then it works. (I'm going to edit this into your answer, feel free to revert if you don't approve) – derobert Oct 26 '12 at 17:41
  • 1
    The last command is giving me output without carriage returns (result is all in one line). I'm using Red Hat Linux to run the command. The first command line works best for me. – Sun Aug 26 '15 at 16:42
3

Sort the list of file names in a case-insensitive way and print duplicates. sort has an option for case-insensitive sorting. So does GNU uniq, but not other implementations, and all you can do with uniq is print every element in a set of duplicates except the first that's encountered. With GNU tools, assuming that no file name contains a newline, there's an easy way to print all the elements but one in each set of duplicates:

for x in *; do printf "%s\n" "$x"; done |
sort -f |
uniq -id

Portably, to print all elements in each set of duplicates, assuming that no file name contains a newline:

for x in *; do printf "%s\n" "$x"; done |
sort -f |
awk '
    tolower($0) == tolower(prev) {
        print prev;
        while (tolower($0) == tolower(prev)) {print; getline}
    }
    1 { prev = $0 }'

If you need to accommodate file names containing newlines, go for Perl or Python. Note that you may need to tweak the output, or better do your further processing in the same language, as the sample code below uses newlines to separate names in its own output.

perl -e '
    foreach (glob("*")) {push @{$f{lc($_)}}, $_}
    foreach (keys %f) {@names = @{$f{$_}}; if (@names > 1) {print "$_\n" foreach @names}}
'

Here's a pure zsh solution. It's a bit verbose, as there's no built-in way to keep the duplicate elements in an array or glob result.

a=(*)(N); a=("${(@io)a}")
[[ $#a -le 1 ]] ||
for i in {2..$#a}; do
  if [[ ${(L)a[$i]} == ${(L)a[$((i-1))]} ]]; then
    [[ ${(L)a[$i-2]} == ${(L)a[$((i-1))]} ]] || print -r $a[$((i-1))]
    print -r $a[$i]
  fi
done
2

I finally managed it this way:

find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d

I used find instead of ls cause I needed the full path (a lot of subdirectories) included. I did not find how to do this with ls.

lamcro
  • 893
1

Without GNU find:

LANG=en_US ls | tr '[A-Z]' '[a-z]' | uniq -c | awk '$1 >= 2 {print $2}'

  • 2
    tr is very likely to wreak havoc on any character set which uses more than a single byte per character. Only the first 256 characters of UTF-8 are safe when using tr.    From Wikipedia tr (Unix)..
    Most versions of tr, including GNU tr and classic Unix tr,
    operate on SINGLE BYTES and are not Unicode compliant..
    – Peter.O Oct 19 '11 at 15:24
  • 1
    Update to my previous comment.. only the first 128 characters of UTF-8 are safe. All UTF-8 characters above the ordinal range 0..127 are all multi-byte and can have individual byte values in other characters. Only the bytes in the range 0..127 have a one-to-one association to a unique character. – Peter.O Aug 28 '12 at 00:08
  • Plus uniq has a case-insensitive flag i. – Jamie Kitson Oct 26 '12 at 12:06
0

The Question:

I there a way to find all files in a directory with duplicate filenames, regardless of the casing (upper-case and/or lower-case)?

An Answer:

I found this works for me on Ubuntu 20.04. I tested this in my home directory with a contrived duplication of filenames; i.e.:

$ touch filename.txt
$ touch FiLeNaMe.TxT

And then:

$ find . -maxdepth 1 -type f | sort -f | uniq -Di 
./filename.txt
./FiLeNaMe.TxT
  • find . : search begins in pwd - ~/ in this case
  • -maxdepth 1 : find defaults to full recursion; this limits that to pwd only
  • -type -f : "regular" files only - no directories, links, etc
  • sort -f : sort required as uniq requires adjacency; -f ignores case
  • uniq -Di : -D prints all dupes; -i ignores case
Seamus
  • 2,925
0

If I'm understanding the question correctly lamcro wanted to be certain of finding every one of a suspected small number of such duplicates. I have found that using FileZilla to do a file transfer of an entire directory to a Windows-based file system causes each and every instance to be caught with a dialog which asks what action to take. Each dialog contains enough information to be certain where the duplicates are, though admittedly it is clumsy compared with a true search, which would generate a list.

rich
  • 1
-1

For anyone else who wants to then rename etc one of the files:

find . -maxdepth 1 | sort -f | uniq -di | while read f; do echo mv "$f" "${f/.txt/_.txt}"; done
JohnFlux
  • 111