0

I'm trying to remove only the email addresses from the 7th field. I've attempted to do this via sed but i'm unable to pick/choose the col that i want to remove. I wanted to remove all the email addresses present in the 7th field.

Input file:

980||||||development@gmail.com||77880|GB||0CA005D||
7980||||||development@gmail.com||5656|PO||69B88008BE||
100||||||apple@appl.com||31000|USA||0C5D||
101||||||||3100df0|CAN||0C5D||
570||||||user@live.com||5521123|RSA||B70F2||
080570||||||test@yahoo.com||AV6777|OI||A005D||
1870||||||USA||5521123|RSA||B70F2||
 70||||||RABBIT||AV6777|OI||A005D||

Output:

980||||||||77880|GB||0CA005D||
7980||||||||5656|PO||69B88008BE||
100||||||||31000|USA||0C5D||
101||||||||3100df0|CAN||0C5D||
570||||||||5521123|RSA||B70F2||
080570||||||||AV6777|OI||A005D||
1870||||||USA||5521123|RSA||B70F2||
70||||||RABBIT||AV6777|OI||A005D||

this is what i tried to get to the result but i'm not able to get there.

sed 's/,[a-z][0-9]\@[a-z][0-9]\.[a-z]//' file
  • One of the solutions being awk 'BEGIN{FS=OFS="|"} { $7=""}1 file' – Valentin Bajrami Sep 20 '19 at 09:30
  • i think the above command removed all the recs from the 7th field. Here i wanted to retain all the recs that doesn't contain a email address. – nancy_olson Sep 23 '19 at 14:12
  • @nancy_olson Please add all requirements you wrote in comments to the question and add some explanatory text. How do you distinguish an email address from something else? Your modified example shows data that either contain an email address as the whole content of column 7 which has to be removed completely or something else that must be left unmodified. If this is your requirement, please state this in your question. – Bodo Sep 23 '19 at 14:36
  • Please be aware that matching email addresses is tricky. https://unix.stackexchange.com/a/194920/117549 is one pointer. https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression is another – Jeff Schaller Sep 23 '19 at 14:51
  • that makes sense here my data contain email in this fashion...xyz@xyz.com – nancy_olson Sep 23 '19 at 14:54

3 Answers3

3

(Script edited after more requirements have been added to the question.)

awk -F '|' -v OFS='|' '$7 ~ /@/ { $7 = "" } { print }' file

Explanation:

-F '|' -v OFS='|' set input and output field separators
$7 ~ /@/ condition: column 7 contains @
{ $7 = "" } action: set column 7 to empty string
{ print } unconditional action: print the line

This script assumes that in column 7 everything that contains a @ is an email address and that there is no additional data which is not part of the email address.

As stated in Valentin Bajrami's comment, you can omit the print statement and add another rule that contains 1 only, which is an "always true" condition (1) with the implicit default action print.

awk -F '|' -v OFS='|' '$7 ~ /@/ { $7 = "" } 1' file

Note: In contrast to the (edited) sample output from the question, the script does not remove the leading blank in the last line of the sample input.

Bodo
  • 6,068
  • 17
  • 27
  • Thank you, i've two doubts..why is $7="". Are you telling Unix to ignore that field? Second, will the same command work if i want to blank out all the email ids present in that field? – nancy_olson Sep 20 '19 at 12:37
  • $7="" tells awk to set column 7 of the current line to an empty string, so it will remove whatever is in column 7. It doesn't check if it is an email address or anything else. – Bodo Sep 20 '19 at 12:40
  • @nancy_olson In the sample data shown in your question column 7 is either empty or contain an email address without any other data. If you want to remove everything that looks like an email address but keep other content, the script would have to be extended. In this case please show in your question some example data with other possible contents in column 7. – Bodo Sep 20 '19 at 13:22
  • Hi...the above command is removing all the data from the 7th field. My requirement is to strip only the email address from that field. – nancy_olson Sep 23 '19 at 14:07
  • @nancy_olson Yes, the script is designed this way because it is sufficient for your sample data. With the sample data as currently shown in the question it produces the expected output. Please add all requirements to your question and show some example input and output data that also shows data that must not be removed from column 7. Without showing such data I don't know how to implement a script for your task. – Bodo Sep 23 '19 at 14:22
  • made the correction :-) – nancy_olson Sep 23 '19 at 14:31
0

Try this.

awk -F\| -vOFS=\| '$7="";1'
steve
  • 21,892
0

Using a simplified email regex to replace the 6th occurrence of | plus optional email address with | (leaving non-email addresses in the 7th field unchanged):

sed 's/|\([^|@]\+@[^|@]\+\.[a-zA-Z]\{2,\}\)\?/|/6' file
  • s/ substitute
  • | match literal |
  • \( begin group
  • [^|@]\+ match one or more non-| and non-@ characters (all characters before the @)
  • @ match literal @
  • [^|@]\+ same as two lines above
  • \. match a dot
  • [a-zA-Z]\{2,\} match 2 or more letters
  • \) end group
  • \? match zero or one group
  • /|/ replace with |
  • 6 match the 6th occurrence of the pattern
Freddy
  • 25,565
  • Thanks for the help mate, but i'm finding the same problem that i experienced earlier with this command. All the data present in the 7th field is getting removed wereas i just want the emailid in particular to get removed. – nancy_olson Sep 23 '19 at 14:07
  • @nancy_olson Added an example to match only email addresses. For a simple match this should be sufficient. – Freddy Sep 23 '19 at 15:32