0

I wanted to grep all the .txt files using wildcard character '*'.
I tried this command (as well as without the quotations " ") but failed.

ls | grep "*.txt"

The interesting thing is that if I put another character in the grep command corresponding to a .txt file in the directory, it works

>>ls | grep s*.txt
sample.txt

I know that ls *.txt will work but I was a bit amazed by the nature of the grep command. Could someone help me why is this happening?

Is it because of the fact that grep uses regular expressions, please help.

  • 1
    It is because it uses regular expressions, rather than wildcard (or "glob") patterns. They use many of the same symbols (like *), but they have different syntax and meanings. – Gordon Davisson Aug 14 '20 at 21:50
  • Relating https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y and https://unix.stackexchange.com/q/87108/117549 – Jeff Schaller Aug 14 '20 at 21:51
  • 1
    Also related: https://stackoverflow.com/questions/23702202/what-are-the-differences-between-glob-style-pattern-and-regular-expression and https://askubuntu.com/questions/714503/regular-expressions-vs-filename-globbing. – Gordon Davisson Aug 14 '20 at 21:56
  • 1
    you probably looking for another command like *find -maxdepth 1 -iname '*.txt'* which searches file names (instead of file content) – alecxs Aug 14 '20 at 23:26
  • 1
    You don't want to grep all the .txt files, you want to find all the .txt files. grep is for doing g/re/p (i.e looking for a string that matches a regexp and printing the result) within files while find is for finding files (i.e. looking for files who match a given criteria including their name matching a globbing pattern). Btw - don't try to parse the output of ls, see https://mywiki.wooledge.org/ParsingLs. – Ed Morton Aug 15 '20 at 03:47

3 Answers3

4

In regexes, * means "any number of the previous item", not "any number of any characters", like it does in shell patterns. And . means "any single character". So, to look for "anything, followed by literal .txt", you'd use .*\.txt. Or just \.txt, since usually regex matches search for the match anywhere in the line. Then again \.txt would also match a filename like foo.txtgz, as the .txt would not have to be at the end. You'd need \.txt$ to lock the pattern to the end of line.

The regex *.txt is either meaningless, an error, or looks for a literal asterisk, depending on the implementation and if you're using basic regexes (grep), or extended regexes (grep -E). Best not to use it.

On the other hand, s*.txt would look for "any number of letters s, then any single character, then literal txt". That's a more valid regex, but... still doesn't match sample.txt.

Instead, what happens in your second command, is that because s*.txt is not quoted, the shell expands the s*.txt before grep sees it. If the only matching file is sample.txt, then grep goes looking for that in the output of ls. (If there were multiple matching filenames, the first would be taken as the pattern, and the rest as filenames for grep to read. In that case, it'd ignore the input from the pipe.)


But, ls can take a list of files too, so while you could use

ls | grep '\.txt'

to get any .txt file, it'd probably be easier to just use

ls *.txt 

instead.

ilkkachu
  • 138,973
1

It is in part because grep uses regular expressions (in fact, that's what the re in the name stands for- it's short for global regular expression print).

The * wildcard in regular expressions is different from the * wildcard in shell globbing.

In regular expressions, * means "zero or more of the previous defined object". However, . is also a wildcard, meaning 'one character'.

In shell globs, * means "zero or more characters". . is not a wildcard at all.

When you grep for the pattern "*.txt", you are looking for zero or more of anything, followed by exactly one more character, followed by the literal string txt.

When you grep for the pattern "s*.txt"m you are looking for a literal s, followed by zero or more ss, followed by any character, followed by the literal string txt`.

This is why one common thing you will find in regular expressions is .*, which means "one of any character followed by zero or more of any character". Regexese for "literally any combination of characters other than zero characters".

When you ls *.txt you are telling the shell "Find any filenames that match the glob pattern *.txt, list them here, and provide those as arguments to the ls command.

ilkkachu
  • 138,973
DopeGhoti
  • 76,081
  • 1
    Specifically, the regular expression s*.txt means: zero or more "s" characters, followed by any single character (that's what . means in an RE), followed by "txt". – Gordon Davisson Aug 14 '20 at 21:52
  • Why *.txt means zero or more of anything in regex? What is the previous regex in this case? – LRDPRDX Aug 15 '20 at 00:50
  • @LRDPRDX it doesn't, a repetition character like * coming first in a regexp like *.txt is undefined behavior so YMMV with what any given tool does with it. – Ed Morton Aug 15 '20 at 03:56
  • * modifies with the previous atom, turning the expected count from the implicit "exactly one" to "zero or more". So .* is zero or more of anything (even nothing at all), not one plus zero or more. One or more anything would be ..* , or .+ in ERE (grep -E). – ilkkachu Feb 09 '24 at 09:01
  • Similarly, a regex starting with an asterisk does not mean "zero or more of anything", the usual rule just cannot apply there since there's nothing preceding the asterisk. Instead, e.g. *.txt either means to match a literal asterisk (in BRE), or it's just undefined and possibly an error (in ERE). (On my GNU/Linux system, grep -E '*.txt' appears to act the same as grep -E '.*.txt', which it can do, since the meaning is undefined. But grep '*.txt' requires the literal asterisk.) – ilkkachu Feb 09 '24 at 09:05
0

please note that grep is searching file content while first argument is search PATTERN and other arguments interpreted as FILES to look into

it becomes more clear to you when using grep -H -o flags or put your grep inside a script and run it with bash -x script to see how shell globs expanded before passed as arguments

alecxs
  • 564