A naive approach would be (assuming GNU utilities):
grep -FZlw -f address.list -- *.eml | xargs -r0 rm -f --
Or the same but with the long options as supported by GNU utilities:
grep --fixed-strings \
--null --files-with-matches \
--word-regexp \
--file address.list \
-- *.eml |
xargs --no-run-if-empty --null \
rm --force --
But that would delete files when addresses are found anywhere in the file, whether it's in the From:
, To:
, Cc:
, Reply-To
headers, or in the body of the email or in attachments.
Also if the address.list
contains, doe@example.com
, that would also delete emails for john.doe@example.com
and doe@example.com.eu
.
That also assumes email addresses are formatted the same (same case, no MIME encoding) in the address.list
and in the eml
files.
If you know exactly how the emails are formatted, for instance if they're always going to contain one and only one occurrence of a line like:
To: address@example.com
Where address@example.com
is formatted exactly like in your address.list
, then you can do:
sed 's/^/To: /' address.list | grep -xZFlf - -- *.eml | xargs -r0 rm -f --
Which would be more reliable.
Instead of passing the address.list
as a list of words to be found anywhere in the files, we're transforming the search list first with the s
tream ed
itor command to prefix each line with "To: "
so that the fixed string patterns become To: address@example.com
and using -x
/--line-regexp
for those (instead of -w
/--word-regexp
) to match the full contents of lines ex
actly. (so To: address@example.com
doesn't match on Reply-To: address@example.com.eu
for instance).
Replace rm -f
with grep -H '^To:'
above if instead of removing the files, you want to check what the To:
header is for the files that are to be removed.
fooXXX.eml
containsmyaddress@domain.com
, it should be deleted. – NickD Jan 31 '20 at 19:21.eml
files are some sort of rfc822 format with header and body. And the email addresses are found in the To: or Cc: headers. Is the syntax consistent? For instance, is always in the To: header and with<
,>
brackets around the email address and on the same line as theTo:
prefix? Are email addresses potentially found elsewhere in the email (like email attachements containing forwarded emails, or in quoted parts or in From/Reply-to headers...)? – Stéphane Chazelas Jan 31 '20 at 19:25