Why these two sed patterns don't give the same output?

Question

I'm trying to extract a couple of fields from each entry of a VCF file. Specifically I want the first and second field and the number after END= to be included. Here is one entry of the file:

1 234529926 AC=1;AF=0.00019968;AFR_AF=0.0008;AMR_AF=0;AN=5008;CIEND=0,500;CIPOS=-500,0;CS=DEL_union;EAS_AF=0;END=234549706;EUR_AF=0;MC=YL_CN_ACB_337;NS=2504;SAS_AF=0;SVTYPE=DEL 0|0

I tried the following to get the result I want:

sed 's|\([\d\s]*\)AC=.*;END=\([0-9]*\).*|\1\2|'

Results in:

1 234529926 234549706

Replacing [0-9] with \d should give the same result but it doesn't:

sed 's|\([\d\s]*\)AC=.*;END=\(\d*\).*|\1\2|'

Gives:

1 234529926

This doesn't make sense, since the [\d\s]*\ group at the beginning works just fine so it can't be the case that sed doesn't understand \d. Why is it so?

Possible duplicate of sed regular expression behaving differently than in vim and perl? and https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y — Sundeep, Sep 28 '18 at 07:00
sed doesn't recognize \d, in the first code you have * on [\d\s], so it is matching zero times as a valid pattern.. \s is recognized by GNU sed, not sure about other versions.. also, \s won't be recognized inside character class in any case — Sundeep, Sep 28 '18 at 07:02
if you do echo '1 234529926 AC=1;AF=0' | sed 's|\([\d\s]*\)AC=|"\1"AC=|' , you will see that capture group didn't capture any character, but * means empty match is fine.. whereas sed 's|\([[:space:]]*\)AC=|"\1"AC=|' will match the space — Sundeep, Sep 28 '18 at 07:05

Why these two sed patterns don't give the same output?

0 Answers0