5

I'd like to find all of the files in my home folder on Linux (Ubuntu, in this case) that contain a match a particular regular expression. Is there a simple Unix command that I can use in order to do this?

For example, I'd like to find all of the files in my home folder with names that contain a match of the following regex (here, using Javascript-style notation): ((R|r)eading(T|t)est(D|d)ata)

4 Answers4

3

Find's -name option supports file globbing. It also supports a limited set of regex-like options like limited square-bracket expressions, but for actual regex matches, use -regex.

If you're looking for a match in the contents of a file, use grep -r as Craig suggested.

If you want to match the filename, then use find with its -regex option:

find . -type f -regex '.*[Rr]eading[Tt]est[Dd]ata.*' -print

Note the shift in regex, because find doesn't portably support bracketed atoms in its regex. If you happen to be on a Linux system, GNU find supports a -regextype option that gives you more control:

find . -regextype posix-extended -regex '.*((R|r)eading(T|t)est(D|d)ata).*' -print

Note that if all you're looking for is case matching, -iregex or even -iname may be sufficient. If you're using bash as your shell, Gilles' globstar solution should work too.

ghoti
  • 6,602
2

grep has a recursive -r option, which will search every file in every subdirectory for the pattern.

The -l option just lists the files containing the pattern. If you want a count of matches in each file, use -c instead, and if you want to see the matches, don't use either -l or -c.

  1. (R|r) is just a verbose way of writing [Rr]. it's also slower than a class (but not enough to matter unless it's in a loop that runs millions of times):

    grep -lr '[Rr]eading[Tt]est[Dd]ata' ~/

  2. Completely case-insensitive:

    grep -lir 'readingtestdata' ~/

  3. if you just want to search files in ~ but not in subdirectories then you can use find:

    find ~/ -maxdepth 1 -type f -print0 | xargs -0r grep -l '[Rr]eading[Tt]est[Dd]ata'

cas
  • 78,579
  • 2
    Instead of xargs you can use find ~ -maxdepth 1 -type f -exec grep -l '[Rr]eading[Tt]est[Dd]ata' {} \+. – rush Aug 26 '12 at 05:45
  • 1
    yes, you can...but I like xargs. find's -exec always seems to suffer from annoying shell-quoting/escaping problems. This is tedious and tends to make things harder to read. – cas Aug 26 '12 at 06:02
  • 2
    @CraigSanders It's the other way round! There's never any quoting problem with find -exec, but using xargs with find only works with “tame” file names or with -print0/-0. – Gilles 'SO- stop being evil' Aug 26 '12 at 21:07
  • If you want to search files in ~ but not in subdirectories, don't use find! It's just grep -l '[Rr]eading[Tt]est[Dd]ata' *. – Gilles 'SO- stop being evil' Aug 26 '12 at 21:08
  • yes, if you don't mind ignoring dot-files (most wouldn't care). also, there's a very small chance that '*' may expand to exceed the command-line length limit. Finally, xargs can be run with -P to run multiple greps in parallel. – cas Aug 26 '12 at 21:31
  • re: quoting issues. your experience must differ from mine. find -exec always gives me problems with shell quoting and escaping, xargs never does. i'm talking about the command itself, not the data in the pipe. – cas Aug 26 '12 at 21:33
  • 1
    Note that the question has been clarified, Anderson is in fact looking for files whose name matches the regex. – Gilles 'SO- stop being evil' Aug 26 '12 at 22:09
2

Shells have wildcard characters that differ from the usual regexp syntaxes: ? to match any single character, * to match any number of characters, and [abc] to match any single character among a, b or c. The following command shows all files whose name matches the extended regular expression¹ ((R|r)eading(T|t)est(D|d)ata) in the current directory:

echo *[Rr]eading[Tt]est[Dd]ata*

If you want to find files in subdirectories as well, then first run shopt -s globstar (you can put this command in your ~/.bashrc). This turns on the ** pattern to match any level of subdirectories:

echo **/*[Rr]eading[Tt]est[Dd]ata*

Shell wildcard characters are not as powerful as regular expressions. For example, there is no or (|) operator. You can get the power of regular expressions, but with a different syntax for historical reasons. Add shopt -s exgblob to your .bashrc, then you can use @(foo|bar) to match foo or bar (like foo|bar in an ERE), *(pattern) to match a sequence any number of occurrences of pattern (like (pattern)* in an ERE), +(pattern) to match one or more occurrences, ?(pattern) to match zero or one occurrence, and !(pattern) to match anything except pattern (no ERE equivalent).

¹ “Extended regular expression” (ERE for short) is the unix name of the regex syntax that JavaScript uses.

0

You can just pass your pattern to find:

$ find . -type f  -name "[Rr]eading[Tt]est[Dd]ata*"

For the specific pattern in the question, you can simply use case insensitive find:

$ find . -type f -iname readingtestdata
terdon
  • 242,166
  • -1 The -name option support a limited number of extensions to file globbing. Its square bracket expressions work in a way that is similar to regexes, but -name does not support full regexes. – ghoti Aug 27 '12 at 11:39
  • @ghoti, I never said it did. The code above works. Its input is a regular expression. It is not an ERE, but even "*a" is essentially a regular expression. According to the Single Unix Specification Version 2, even a single character is a Basic Regular Expression matching itself. – terdon Aug 27 '12 at 12:37
  • 1
    You said "The find command can take a regular expression as input", then you provided an example where find is interpreting a file glob. The fact that globbing use some similar notation to regular expressions is immaterial. The above string is not a regular expression unless it is interpreted using find's -regex option. – ghoti Aug 27 '12 at 13:25
  • 1
    @ghoti, I made this into a question, come convince me there :). – terdon Aug 27 '12 at 13:48
  • 1
    @ghoti, I also removed any mention of regex to avoid confusion. – terdon Aug 27 '12 at 14:19