Using sed to remove string between < >

Question

I want to remove the string between the first pair of < >

Original text:

< a href="ACM-Reference-Format.dbx"> ACM-Reference-Format.dbx < /a >

I want to be left with just

ACM-Reference-Format.dbx</a>

I tried using

sed 's/[<->]*/ but it only removed the first <

Does this answer your question? Non-greedy match with SED regex (emulate perl's .*?) — Panki, Mar 25 '22 at 14:13

mashuptwice · Answer 1 · 2022-03-25T14:40:33.477

In regex [] will define a character class, which will match any character in between the brackets. For example you could match any character in the alphabet between a-z with [a-z]. This won't help with your example.

What you want to do instead is match < followed by any character followed by >.

Usually you could to that with <.*?>, but as Panki pointed out sed doesn't support non-greedy matches.

You can instead match any character, except for > and /:

sed 's/<[^>\/]*>\s//'

Example:

─$ echo "< a href="ACM-Reference-Format.dbx"> ACM-Reference-Format.dbx < /a > " | sed 's/<[^>\/]*>\s//'
ACM-Reference-Format.dbx < /a >

Explanation:

<[^>\/]*>
<           #matches <
 [^   ]     #negated character class, matches any character except the ones specified
   > /      #the characters not to be matched
    \       #escaping the following slash to prevent it from being interpreted as special symbol
       *    #matches previous character between 0 and infinity times
        >   #matches >

@schrodigerscatcuriosity Not necessarily better, as both lead to the same result, but easier than working with groups and reinserting them — mashuptwice, Mar 25 '22 at 14:41
you could use Eg: @ as separator instead of / to avoid escaping. — DanieleGrassini, Mar 25 '22 at 22:43

score 0 · Answer 2 · answered Mar 25 '22 at 14:30

0

You can do the following:

$ sed 's/[^>]*> \([^>]*\)/\1/' file # or string
ACM-Reference-Format.dbx < /a >

answered Mar 25 '22 at 14:30

schrodingerscatcuriosity

12,396

1

why not just : 's/^[^>]*> //' or 's/^[^>]*>//' to preserve the space? – DanieleGrassini Mar 25 '22 at 22:53

Using sed to remove string between < >

2 Answers2