1

I cannot figure out what's wrong with my regex, it seems to be working with grep but now with find. I'm trying to find all the files which follow the common expression in TV shows e.g. S02E21.

find -E . -name '.*[sS]{1}[0-9]{1,2}[\.]?[eE]{1}[0-9]{1,2}.*\.mkv'

I get no results with find, however if I use the same regex in combination with ls|grep -E '....', the files are found as expected.

user14492
  • 853

2 Answers2

2

-name takes wildcard patterns, not regexps and matches on the file name, not its full path. Use -regex (or -iregex) for regexp matching but beware it matches against the full path. Here, you could do:

LC_ALL=C find -E . -iregex '.*s[0-9]{1,2}\.?e[0-9]{1,2}[^/]*\.mkv'

Here, we're replacing the second .* with [^/]*, that is a sequence of non-/ characters to make sure the pattern before it matches on the file name and not any of the directory components.

By fixing the locale to C with LC_ALL=C, we're making sure . matches any byte and [^/] any byte but the one for / as otherwise you could run into problems with file or directory names encoded in a different character set as your locale's. Fixing the locale to C also guarantees e only matches on e and E (and s on s and S) with -iregex.

Note that [\.] matches on backslash or dot. To match a dot, it's either \. or [.]. Also x{1} is the same as x, so I've removed those {1} for simplification.

Check your man page for details. Note that none of -E, -regex or -iregex are standard.

It can be simplified to

LC_ALL=C find -E . -iregex '.*s[0-9]{1,2}\.?e[0-9][^/]*\.mkv'

As the second digit if any would also be matched by [^/] anyway.

The standard equivalent using wildcard patterns would look like:

LC_ALL=C find . -name '*[sS][0-9][0-9].[eE][0-9]*.mkv' \
             -o -name '*[sS][0-9].[eE][0-9]*.mkv' \
             -o -name '*[sS][0-9][0-9][eE][0-9]*.mkv' \
             -o -name '*[sS][0-9][eE][0-9]*.mkv'

wildcard patterns, contrary to extended regular expressions don't have an alternation operator nor the equivalent of ? or {n,p}, so we need 4 patterns to cover all possibilities.

You could also use a shell with recursive globbing and advanced wildcard patterns like zsh:

setopt extendedglob
ls -lrtd -- **/(#i)*s<->e<->*.mkv
  • **/ recursive search
  • (#i) case insensitive matching
  • <-> any decimal number

Passing to ls -lrtd here to print a list with details, sorted by last modification time, though of course you can use any command.

0

find dir -name just supports shell file name glob characters as documented by man fnmatch.

Some find implementations support non-standard extensions for regular expressions. Check your find man page.

schily
  • 19,173