20

While we use * to denote zero or more previous characters in grep, we use *.c to find all C files when we use it with the ls command like ls *.c. Could someone tell how the use of * differs in these two cases?

user
  • 28,901
user3539
  • 4,378

2 Answers2

33

Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.

In file name globbing:

  • * means "zero or more characters"

  • ? means "any single character"

But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.

Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).

These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)

Warren Young
  • 72,032
  • 2
    I don't know it was called globbing :) – user3539 Dec 08 '12 at 13:07
  • 4
    In addition, there are various flavours of regex. Not all regexes are created the same! And you have many other pattern matching systems, such as SQL like, where '%' means '*'. – Mr Lister Dec 08 '12 at 13:48
  • 5
    Two major flavors of regexp are POSIX and PCRE (Perl Compatible R.E.). The later is less long-winded and has some more features. Unix tools and shells generally use POSIX, most programming languages with built-in regexps (except shell) use PCRE. Just beware the difference when you are reading material on-line. – goldilocks Dec 08 '12 at 15:05
12

Answering to the question expressed in the original title:

Why do regular expressions differ from that used to filter files?

File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.

While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...

jlliagre
  • 61,204
  • 2
    Another reason for the “why” part: speed. Regular expressions are slower: http://pastebin.com/3iNCgkE3 – manatwork Jan 01 '13 at 11:21
  • 3
    *.txt doesn't equal .*\.txt, it (mostly) equals .*\.txt$ because there can be nothing after the .txt (at least assuming reasonable file name globbing). Perhaps even ^.*\.txt$ somewhat depending on usage. Proves your point? – user Jan 09 '13 at 08:17