10

How can I list the file/directory names in a directory recursively that are the same, but with different capitalization/case? ex.:

INPUT (not the ls command, the directories):

[user@localhost ~/a] ls -R
.:
b

./b:
ize  Ize

./b/ize:

./b/Ize:
[user@localhost ~/a] 

OUTPUT:

/b/ize
gasko peter
  • 5,514

3 Answers3

9

If you have GNU uniq, you can sort case insensitively (-i), and use -d to print only duplicate lines:

find . | sort -f | uniq -di

As @StephaneChazelas mentioned in his answer, this might not do what you expect if you can have duplicate paths that only differ in case (like a/b/foo and A/b/foo).

terdon
  • 242,166
  • You probably want sort -f here. Also note that GNU uniq has the same limitation as GNU tr as in it doesn't work for matching case of multi-byte characters. – Stéphane Chazelas Aug 03 '13 at 22:55
  • @StephaneChazelas why do I want sort -f? If uniq can deal with the case, why would I also need to make sort case insensitive? And what do you mean by multi-byte characters? Things like \n,\r etc? How can they have different cases? – terdon Aug 04 '13 at 00:19
  • 1
    Try export LC_ALL=C; printf '%s\n' a A b B | sort | uniq -di. Some locales sort case-insensitively, some others (like C) don't. uniq needs a sorted input, its duplicate lines must be adjacent. – Stéphane Chazelas Aug 04 '13 at 07:25
2

Assuming file names don't contain newline characters, you could do something like:

find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d

Note that some tr implementations like GNU tr don't change the case of multi-byte characters.

Also note that the path it reports may not be the paths of any file. For instance, if there's a ./a/b/fOo and a ./A/b/fOo file, it will report ./a/b/foo. If it's not what you want, you may want to refine your requirements.

-1

All of these ideas are bad. Use checksums and be sure the files are the same. Then the task becomes easy.

find . -type f -exec md5sum {} + |
sort |
perl -a -nE'push(@{$db{$F[0]}},$F[1]);END{for(keys%db){say"Dupe detected @{@db{$_}}"if scalar@{$db{$_}}>1}}'

This will sha1sum every file in the directory and all subdirectorys and output all dupes of that file, if there are any. I made the pipeline multiline for readability.

Evan Carroll
  • 30,763
  • 48
  • 183
  • 315
  • 1
    The OP is not looking for identical files, he's looking for files with the same name, the contents may differ. Sorry, but it is this idea that is bad :). – terdon Aug 04 '13 at 13:58
  • His first example said different font size, suffice it to assume he doesn't have an idea of what he wants. – Evan Carroll Aug 04 '13 at 17:25
  • 2
    Suffice it to say that English is not his native language, hardly the OP's fault that. However, the example clearly shows that he is not comparing the files, just looking for files of the same name in a case-insensitive manner. All I'm saying is that you might want to read a question more closely before deciding which ideas are "bad". – terdon Aug 04 '13 at 17:47
  • 1
    Agreed. This doesn't address the OP's concern. I also find it strange that you labeled an answer accepted by the OP as a bad idea because it's not what the OP wants! – Joseph R. Aug 04 '13 at 17:53