-1

I was trying to find files in a certain directory which don't adhere to the naming guidelines for UNIX-like systems.

When using with find the command find <dir> -regex '.*[^-_./0-9a-zA-Z].*' returns the files of interest.

my question with above command line is:

  1. Why did we need the any one character metacharacter . before the zero or more * metacharacter at the start and end of the regex respectively for this to work as intended. when i initially tried with find <dir> -regex '*[^-_./0-9a-zA-Z]*' that returned nothing.
  2. Furthermore, if I replace the character ranges in the regex with their corresponding POSIX character classes with everything else intact: find <dir> -regex '.*[^-_./[:digit:][:lower:][:upper:]].*' it returns nothing. why is it this way?

TIA!

ilkkachu
  • 138,973
  • 1
    Have you read the description of -regex? It matches over the whole path. – muru Aug 08 '20 at 09:57
  • and what about that? – computronium Aug 08 '20 at 10:58
  • If you exclude /, the path separator, in what system do you think your regex will match a whole path? – muru Aug 08 '20 at 12:01
  • @muru Where am I excluding the path separator? the regex doesn't exclude the path separator, It is supposed to find files which have characters other than the recommended characters: - or _ or . or / or digits or lowercase letters or uppercase letters... So, to answer you question I think this will work on all UNIX-like systems. Besides, that wasn't my question?! – computronium Aug 08 '20 at 13:40
  • 1
    If you don't include . which matches anything, and explicitly exclude /, which is the path separator, of course you're excluding the path separator. I have no idea what your random bolding is meant to imply. – muru Aug 08 '20 at 13:52
  • 1
    @muru if i don't include . and don't explicitly exclude the /, the resulting regex *[^-_.0-9a-zA-Z]* won't work then either, will it? . has to be included not just because of the path separator. My misconception here was cleared pretty nicely by @steeldriver. The regex won't work without the leading and trailing . because then the quantifier * isn't specifying what it wants to have none or more of. It was a misconception because I was mixing up the shell wildcard * with the regex quantifier *... – computronium Aug 08 '20 at 15:14
  • 1
    ...which @steeldriver intuited out pretty well. You on the other hand, just kept nitpicking on one thing that was not relevant to my question, IMO. So, apologies if you found my random bolding off-putting. – computronium Aug 08 '20 at 15:14
  • 2
  • @muru, the command looks for filenames that contain anything other but the listed characters. Like, say, a comma, or a dollar sign or ... If the slash wasn't listed, it would match pretty much everything (since -regex matches against the full path) – ilkkachu Aug 08 '20 at 16:44
  • I suppose the alternative would be find . ! -regex '[-_./0-9a-zA-Z]*', if that's any clearer – ilkkachu Aug 08 '20 at 16:58
  • @muru yes, it does answer the first question. steeldriver has answered both. – computronium Aug 10 '20 at 07:24

1 Answers1

3
  1. * in regular expression syntax is a quantifier applying to the previous regex atom (in this case, .). It is not itself a "zero or more metacharacter" as it would be in shell pattern matching syntax (aka "globbing").

  2. may be an idiocyncracy of the default Emacs regextype - try -regextype posix-basic or -regextype egrep for example if you want more familiar behavior.

steeldriver
  • 81,074