1

For example, I wanted to count how many files in my current directory have a certain extension.

I used:

ls | grep ".txt" | wc -l

This works and so does:

ls | grep ".txt*" | wc -l

But why does this not work either?

ls | grep "*.txt" | wc -l

Why does the wildcard seem to not work in the grep expression when I use it before the extension and why does it have no effect at the end (all txt files simply end with .txt so I am guessing it does not work if it was something like .txt.zip)?

KBwonder
  • 156

1 Answers1

2
$ ls *.txt

This command will use shell globbing to list all files whose names end with .txt.

$ ls | grep "*.txt"

This command will list all (non-hidden) files in the present working directory, and send that output to grep, which will match the filenames against the regular expression /*.txt/.

/*.txt/

This regular expression may (depending on what flavor of regular expressions is in play) match a pattern of:

*    -- zero or more characters of any type (or possibly only a literal '*'), followed by
.    -- exactly one character of any type, followed by 
txt  -- the literal string 'txt', followed by anything

In regular expressions, * is a wildcard standing for "zero or more of the preceding subexpression"; but it works differently from shell glob wildcards. Relatedly, . is not a period; it is a wildcard for one character (analogous to the ? wildcard in shell globs). As such, this expression would (again, depending on what flavor of regex) match any of file.txt, sometxtfile, photo_of_a_txt_file.png, but indeed not txtfile (as there is no match for the one character before txt). It is important to know that the literal string txt can, therefore, appear anywhere but at the very start of the file name with this regular expression.

A better regular expression to catch file names that end in .txt would be /\.txt$/:

\.  -- A literal .
txt -- The literal string 'txt'
$   -- End of input

So therefore, if you insist on piping ls into grep (and let's not, for the moment, get into the books that could be written on why parsing the output of ls is a bad idea), you probably mean to do this:

$ ls | grep "\.txt$"

As for then using wc, you don't need to do this. grep can count:

$ ls | grep -c "\.txt$"
DopeGhoti
  • 76,081
  • 1
    /*.txt/ -- "this expression would match any of file.txt, sometxtfile..." -- no, the leading asterisk in the expression is not special (as far as I can read the spec, and as far as my seds and greps work), so the expression would match a string containing an asterisk, then any character, then txt, e.g. a*btxtc etc. Except if you're using Perl, where an asterisk following nothing would be an error. – ilkkachu Jan 08 '20 at 19:07
  • 1
    To complicate matters, a leading * is treated differently in BRE grep and ERE grep -E (at least in the GNU implementations); in BRE it is treated as literal (which is why the OP sees no output in that case), whereas in ERE it appears to mean zero or more instances of the empty pattern, which matches anything. – steeldriver Jan 08 '20 at 19:08
  • "Regular Expression" is often a misnomer, what with BRE, ERE, PCRE, RRE, JSRE, and all the other flavors out there making them highly irregular. – DopeGhoti Jan 08 '20 at 19:10
  • @DopeGhoti, I don't think the "regular" there refers to that. – ilkkachu Jan 09 '20 at 00:51
  • I am aware, @ilkkachu, I was just making light of the term. – DopeGhoti Jan 09 '20 at 14:35
  • @steeldriver But in posix an initial * is undefined the following uses produce undefined results: If these characters appear first in an ERE. And, in GNU grep, an initial * will match only empty spaces, not any character. Try echo hello | grep -Eo '*', no output, no match (without the o there will be (an empty) match). –  Jan 09 '20 at 15:23