-2

I have a very large dataset which is supposed to consist of emails. However, there are a large amount of invalid emails that need to be removed from the file completely.

Here are some examples:

89 is @msn .com
89!3@nomail.com
89%@yahoo.com
89%azn@yahoo.com
89&#39:s@msn.com
89'Mustang@yahoo.com
89's@msn.com
89&main@yahoo.com
89+475asdjkl:jkl@aol.com
89+475asdjkl;jkl@aol.com
89+ggg@hotmail.com

Is there a simple approach available to remove lines which contain invalid emails from the file?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Simple approach: send an email to each address and wait for a bounce. Bounce = invalid; no bounce = valid – Jeff Schaller Jan 15 '18 at 15:22
  • @JeffSchaller as long as it's the right sort of bounce (i.e. not a temporary rejection during the SMTP conversation) – Chris Davies Jan 16 '18 at 00:12