How do regular expressions differ from wildcards used to filter files

Question

While we use * to denote zero or more previous characters in grep, we use *.c to find all C files when we use it with the ls command like ls *.c. Could someone tell how the use of * differs in these two cases?

Warren Young · Accepted Answer · 2016-03-30T12:57:36.177

Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.

In file name globbing:

* means "zero or more characters"
? means "any single character"

But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.

Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).

These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)

In addition, there are various flavours of regex. Not all regexes are created the same! And you have many other pattern matching systems, such as SQL like, where '%' means '*'. — Mr Lister, Dec 08 '12 at 13:48
Two major flavors of regexp are POSIX and PCRE (Perl Compatible R.E.). The later is less long-winded and has some more features. Unix tools and shells generally use POSIX, most programming languages with built-in regexps (except shell) use PCRE. Just beware the difference when you are reading material on-line. — goldilocks, Dec 08 '12 at 15:05

jlliagre · Answer 2 · 2016-09-21T09:15:59.567

12

Answering to the question expressed in the original title:

Why do regular expressions differ from that used to filter files?

File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.

While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...

edited Sep 21 '16 at 09:15

answered Jan 01 '13 at 10:44

jlliagre

61,204

2

Another reason for the “why” part: speed. Regular expressions are slower: http://pastebin.com/3iNCgkE3 – manatwork Jan 01 '13 at 11:21
3

*.txt doesn't equal .*\.txt, it (mostly) equals .*\.txt$ because there can be nothing after the .txt (at least assuming reasonable file name globbing). Perhaps even ^.*\.txt$ somewhat depending on usage. Proves your point? – user Jan 09 '13 at 08:17

How do regular expressions differ from wildcards used to filter files

2 Answers2

Linked

Related