5

I have a long list of email addresses I need to extract, however I can't find the right way to do it.

The data is is structured similar to this.

Patabee meeta needo buffalos@outlook.com pizz bees
Needo target@outlook.com hama lines question
unix search exchange helpme@outlook.com extracts

One thing that is consistent in my data is the email domains.

Currently i have...

grep -oniT @outlook.com /path/to/file/of/emails/and/such.txt

which returns a nice output of..

3624   :@outlook.com
3625   :@outlook.com
3626   :@outlook.com
3630   :@outlook.com
3631   :@outlook.com
3632   :@outlook.com
3633   :@outlook.com
3634   :@outlook.com
3635   :@outlook.com

I need it however to select the whole email address, not just the domain (which is what im currently searching.

How can i make grep select the entire field that it found the matching string, but not the entire line?

ilkkachu
  • 138,973
TrevorKS
  • 638

2 Answers2

7

Here is a solution using grep:

grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" /path/to/file/of/emails/and/such.txt

This will get all email addresses in the file. You may want to adapt the regex to match only a specific domain.

-E, --extended-regexp Interpret PATTERN as an extended regular expression

-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

1

-o prints only the part matching the pattern, so you'll need to extend the pattern to include the part before the @. With the addresses in your sample, catching any non-blanks should do:

$ grep -oniTE '[^[:blank:]]+@outlook.com'  foo 
  1:    buffalos@outlook.com
  2:    target@outlook.com
  3:    helpme@outlook.com

In general, though, email addresses are difficult to parse (they can contain quoted whitespace), and the above will miss some valid email addresses (as well as include some invalid ones). See e.g. Wikipedia and the relevant standards for the gory details.

ilkkachu
  • 138,973