While we use * to denote zero or more previous characters in grep, we use *.c to find all C files when we use it with the ls command like ls *.c. Could someone tell how the use of * differs in these two cases?
2 Answers
Shell file name globbing and regular expressions use some of the same characters, and they have similar purposes, but you're right, they aren't compatible. File name globbing is a much less powerful system.
In file name globbing:
*means "zero or more characters"?means "any single character"
But in regexes, you have to use .* to mean "zero or more characters", and . means "any single character." A ? means something quite different in regexes: zero or one instance of the preceding RE element.
Square brackets ([]) appear to work the same in both systems on the system I'm typing this on, for simple cases at least. This includes things like POSIX character classes (e.g. [:alpha:]). That said, if you need your commands to work on many different system types, I recommend against using anything beyond elementary things like lists of characters (e.g. [abeq]) and maybe character ranges (e.g. [a-c]).
These differences mean the two systems are only directly interchangeable for simple cases. If you need regex matching of file names, you need to do it another way. find -regex is one option. (Notice that there is also find -name, by the way, which uses glob syntax.)
- 72,032
Answering to the question expressed in the original title:
Why do regular expressions differ from that used to filter files?
File name expansion predates regular expressions, already existed with most operating systems (wildcard/joker characters) and is much simpler and intuitive than the latter.
While *.txt is easily understandable by casual users, the analogous .*\.txt is something more targeted to experienced users/programmers, not to mention ^.*\.txt$ ...
- 61,204
-
2Another reason for the “why” part: speed. Regular expressions are slower: http://pastebin.com/3iNCgkE3 – manatwork Jan 01 '13 at 11:21
-
3
*.txtdoesn't equal.*\.txt, it (mostly) equals.*\.txt$because there can be nothing after the.txt(at least assuming reasonable file name globbing). Perhaps even^.*\.txt$somewhat depending on usage. Proves your point? – user Jan 09 '13 at 08:17
'%'means'*'. – Mr Lister Dec 08 '12 at 13:48