1

I want to remove the string between the first pair of < >

Original text:

< a href="ACM-Reference-Format.dbx"> ACM-Reference-Format.dbx < /a > 

I want to be left with just

ACM-Reference-Format.dbx</a> 

I tried using

sed 's/[<->]*/ but it only removed the first <

2 Answers2

1

In regex [] will define a character class, which will match any character in between the brackets. For example you could match any character in the alphabet between a-z with [a-z]. This won't help with your example.

What you want to do instead is match < followed by any character followed by >.

Usually you could to that with <.*?>, but as Panki pointed out sed doesn't support non-greedy matches.

You can instead match any character, except for > and /:

sed 's/<[^>\/]*>\s//'

Example:

─$ echo "< a href="ACM-Reference-Format.dbx"> ACM-Reference-Format.dbx < /a > " | sed 's/<[^>\/]*>\s//'
ACM-Reference-Format.dbx < /a > 

Explanation:

<[^>\/]*>
<           #matches <
 [^   ]     #negated character class, matches any character except the ones specified
   > /      #the characters not to be matched
    \       #escaping the following slash to prevent it from being interpreted as special symbol
       *    #matches previous character between 0 and infinity times
        >   #matches >
mashuptwice
  • 1,383
0

You can do the following:

$ sed 's/[^>]*> \([^>]*\)/\1/' file # or string
ACM-Reference-Format.dbx < /a >