Using cut/awk/sed with two different delimiters

Question

I have the following cases:

 case@test.com
 case_1_2@test.com
 case_1@test.com

I'm trying to convert these to

 case@test.com
 case@test.com
 case@test.com

So it should remove everything from the first '_' (including it) to the @ (not including that).

I have something, but it doesn't really work correctly:

Based on this thread: Cut based on Two Delimiters at one go, and this U&L Q&A: Splitting string by the first occurrence of a delimiter.

sed 's/^.*_\([^ ]*\) .*\@\([^$]*\)$/\1 \2/' infile

But no luck. Anyone want to take a chime at it?

score 4 · Answer 1 · answered Aug 20 '13 at 23:30

Not sure what you're really doing with this but your could do it like so with sed:

$ sed 's/\(case\).*\(@test.com\)/\1\2/' 87529.txt 
case@test.com
case@test.com
case@test.com

This effectively trims everything out between case and the @.

You can do something similar with awk:

$ awk -F@ '{split($1,a,"_"); print a[1]"@"$2}' 87529.txt

Also can be done with perl (similar to evilsoup's approach):

$ perl -p -e 's/_.*@/@/g' 87529.txt

Or you can make use of perl's lookahead facility:

$ perl -p -e 's/_.*(?=@)//g' 87529.txt

NOTE: Lookahead and lookbehind's in perl allow you to include strings in the regex pattern that you're matching on, without having them be included in the operation that will be performed against the regex. Think of them as dynamic versions of the caret (^) - beginning of a line, and dollar ($) - end of the line. This a little less hacky then having to add the @ back in, after removing it.

score 3 · Accepted Answer · answered Aug 20 '13 at 22:35

3

Assuming you won't ever have more than one @ symbol,

sed 's/_.*@/@/' file.txt

...should work.

answered Aug 20 '13 at 22:35

evilsoup

6,807
3
34
40

score 1 · Answer 3 · answered Aug 21 '13 at 08:10

1

If your shell supports parameter expansion, you can do something like

while read line; do
    printf "%s\n" "${line%%_*}@${line#*@}"
done < your_file_here

The expansion ${line%%_*} removes the leftmost _ and everything following it while the expansion ${line#*@} removes the leftmost @ and everything preceding it.

answered Aug 21 '13 at 08:10

Joseph R.

39,549

As tagged [tag:bash], parameter expansion based solution can be shorter when used on array: http://pastebin.com/kgq89527 – manatwork Aug 21 '13 at 08:25
1

@manatwork Beautiful. I would say this merits to be in answer of its own with a short explanation perhaps... – Joseph R. Aug 21 '13 at 08:41
However note that performance will be very poor on big files as anything using loops in bash, and that assumes all lines contain one @ and one _ before the left-most @ (and the other usual problems when using read with -r and without setting IFS) – Stéphane Chazelas Aug 22 '13 at 20:01

score 1 · Answer 4 · answered Aug 22 '13 at 20:05

1

If the lines may contain more than one @:

sed 's/^\([^@_]*\)_[^@]*@/\1@/'

Or:

awk -F@ -vOFS=@ 'NF >= 2 {sub(/_.*/,"",$1)};1'

answered Aug 22 '13 at 20:05

Stéphane Chazelas

544,893

score 0 · Answer 5 · edited Apr 13 '17 at 12:37

Evilsoup's solution seems to be perfect!

Yet another solution using both sed and awk.

sed 's/_/ /g; s/@/ /g' file_name | awk '{ print $1"@"$NF '}

This would not exactly count for efficiency, but may be simple to understand, perhaps, when one does not want to mess with regular expressions. The above code does the following:

The first pattern of sed replaces "_" with a blank.
The second pattern of sed replaces "@" with a blank. So, now we contents of the file separated into multiple columns:

case test.com
case 1_2 test.com
case 1 test.com

Finally, awk simply prints the first and last columns of the separated contents. Here, NF is a special symbol in awk that gives the number of fields in a row.

I think you need global substitutions (i.e. s/_/ /g;...) – Joseph R. Aug 21 '13 at 07:48 — Joseph R., Aug 21 '13 at 07:48

score 0 · Answer 6 · answered Aug 21 '13 at 12:52

0

Here's another gawk way:

gawk -F_ '{if(NF>1){print $1$NF} else {print $NF}}'

Using _ as a field delimiter, we tell gawk to print the first and last fields if there are more than one field and the last field if there is only a single field.

answered Aug 21 '13 at 12:52

terdon

242,166

Using cut/awk/sed with two different delimiters

6 Answers6

Linked