0

I want to grep across 500 files in a Maildir directory. I issued the command

grep MyPattern *

I got the error message:

bash: /usr/bin/grep: Argument list too long

So I stored the list of files in a file MyFiles, and issued the following

for i in $(`cat MyFiles`); do echo $i; done

Before I did a grep, I wanted to do an echo just as a check. But this gave the following error

bash: 1434361691.M617282P6399V0000000000000808I00000000000E16C1_23.ananda-linux,S=10055:2,S: command not found

where that 1434... thing is the first file in the directory.

So back to the original question. How do I grep across all these files in the mailbox. And I have larger mailboxes containing 50000 or more emails.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • There's nothing inherent in a Maildir that makes this special or any different from https://unix.stackexchange.com/questions/85789/ . – JdeBP Apr 26 '18 at 08:08

2 Answers2

1

Ask grep itself to construct the file list, by recursing from the current directory:

grep -r MyPattern .

This isn’t quite the same as *, since it will search in sub-directories, but for mail directories that’s usually what you want.

Stephen Kitt
  • 434,908
1

When the shell executes an external command, the length of the command line, after expansion of any filename globbing pattern such as *, must not exceed a particular length.

In your case, grep 'PATTERN' * expands to a too long command for the shell to execute.

In your second example:

for i in $(`cat MyFiles`); do echo $i; done

you try to iterate over the filenames stored in MyFiles, but you are getting the syntax very wrong.

$(`cat MyFiles`)

is the same as

$( $(cat MyFiles) )

which means the contents of the MyFiles is going to be interpreted as a command. This is why you get the command not found error.

There are several ways to remedy this, but looping over the contents of your file is not really a good one.

Stephen gives a good solution in his answer, and another one would be, assuming that your current working directory is your Maildir folder,

find . -type f -exec grep 'PATTERN' {} +

This would execute grep a few times on as large batches of files as possible.

This is similar to

printf '%s\n' * | xargs grep 'PATTERN'

but the find command handles filenames with spaces and embedded newlines.

The printf command here will output one filename per line. It does not suffer from the same problem as grep 'PATTERN' * since it's very likely a built-in command, and therefore does not have to be executed as an external command by the shell.

Your loop solution would also work, but rather than looping over the output of cat, you could simply do

for name in *; do
    grep 'PATTERN' "$name"
done

This assumes that there are only regular files in the current directory.

To make sure that you only process mail messages, you may use

for name in *,*; do
    grep 'PATTERN' "$name" /dev/null
done

This iterates over names that contains at least one comma. I have also added /dev/null to force grep to output the name of the files matching the given pattern. You can remove /dev/null and instead use -H with grep if your grep supports this.

Looping like this is slow, since we execute grep once for each single file in the directory.

Kusalananda
  • 333,661