How to find file/directory names that are the same, but with different capitalization/case?

Question

How can I list the file/directory names in a directory recursively that are the same, but with different capitalization/case? ex.:

INPUT (not the ls command, the directories):

[user@localhost ~/a] ls -R
.:
b

./b:
ize  Ize

./b/ize:

./b/Ize:
[user@localhost ~/a]

OUTPUT:

/b/ize

I guess you mean "different case" (not "different font size")? — phunehehe, Aug 03 '13 at 13:13
Duh, capitalization, I couldn't figure out what he was asking. — slm, Aug 03 '13 at 14:01
@gasko-peter are you looking for files with similar names because you're trying to identify the same file with a different names? — Evan Carroll, Aug 05 '13 at 00:44
Very similar question case-insensitive search of duplicate file-names — R.M., Dec 14 '16 at 23:40

score 9 · Accepted Answer · edited Apr 13 '17 at 12:36

9

If you have GNU uniq, you can sort case insensitively (-i), and use -d to print only duplicate lines:

find . | sort -f | uniq -di

As @StephaneChazelas mentioned in his answer, this might not do what you expect if you can have duplicate paths that only differ in case (like a/b/foo and A/b/foo).

edited Apr 13 '17 at 12:36

Community

1

answered Aug 03 '13 at 14:52

terdon

242,166

You probably want sort -f here. Also note that GNU uniq has the same limitation as GNU tr as in it doesn't work for matching case of multi-byte characters. – Stéphane Chazelas Aug 03 '13 at 22:55
@StephaneChazelas why do I want sort -f? If uniq can deal with the case, why would I also need to make sort case insensitive? And what do you mean by multi-byte characters? Things like \n,\r etc? How can they have different cases? – terdon Aug 04 '13 at 00:19
1

Try export LC_ALL=C; printf '%s\n' a A b B | sort | uniq -di. Some locales sort case-insensitively, some others (like C) don't. uniq needs a sorted input, its duplicate lines must be adjacent. – Stéphane Chazelas Aug 04 '13 at 07:25

Stéphane Chazelas · Answer 2 · 2013-08-03T22:53:16.433

Assuming file names don't contain newline characters, you could do something like:

find . | tr '[:upper:]' '[:lower:]' | sort | uniq -d

Note that some tr implementations like GNU tr don't change the case of multi-byte characters.

Also note that the path it reports may not be the paths of any file. For instance, if there's a ./a/b/fOo and a ./A/b/fOo file, it will report ./a/b/foo. If it's not what you want, you may want to refine your requirements.

Evan Carroll · Answer 3 · 2013-08-04T01:02:01.677

-1

All of these ideas are bad. Use checksums and be sure the files are the same. Then the task becomes easy.

find . -type f -exec md5sum {} + |
sort |
perl -a -nE'push(@{$db{$F[0]}},$F[1]);END{for(keys%db){say"Dupe detected @{@db{$_}}"if scalar@{$db{$_}}>1}}'

This will sha1sum every file in the directory and all subdirectorys and output all dupes of that file, if there are any. I made the pipeline multiline for readability.

edited Aug 04 '13 at 01:02

answered Aug 04 '13 at 00:56

Evan Carroll

30,763
48
183
315

1

The OP is not looking for identical files, he's looking for files with the same name, the contents may differ. Sorry, but it is this idea that is bad :). – terdon Aug 04 '13 at 13:58
His first example said different font size, suffice it to assume he doesn't have an idea of what he wants. – Evan Carroll Aug 04 '13 at 17:25
2

Suffice it to say that English is not his native language, hardly the OP's fault that. However, the example clearly shows that he is not comparing the files, just looking for files of the same name in a case-insensitive manner. All I'm saying is that you might want to read a question more closely before deciding which ideas are "bad". – terdon Aug 04 '13 at 17:47
1

Agreed. This doesn't address the OP's concern. I also find it strange that you labeled an answer accepted by the OP as a bad idea because it's not what the OP wants! – Joseph R. Aug 04 '13 at 17:53

How to find file/directory names that are the same, but with different capitalization/case?

3 Answers3