Using sed to replace one character with another within an xml tag

Question

I need to replace the character S with T in:

<episode-num system="onscreen">S1 E12</episode-num>

The result I expect:

<episode-num system="onscreen">T1 E12</episode-num>

I don't know how Git works in depth, I'm just using it to replace that character in my xml tag, researching in forums I found some information and tried the following command line:

sed -e :l -e 's@\(<episode-num system="onscreen">.*\)S\([^amp;]\)\(.*</episode-num>\)@\1T\2\3@;tl' guide.xml

But it does not work, I hope you can help me please.

Hello Diego, is the tag you want to change always the same? Also, could you copy and paste command you used into your question? It would be more convenient so that others could copy-paste it... — golder3, Feb 03 '22 at 14:06
Also consider https://stackoverflow.com/q/8577060 and https://flapenguin.me/xml-regex — U. Windl, Feb 03 '22 at 22:41

Kusalananda · Answer 1 · 2022-02-03T18:17:04.793

Assuming you have some XML document, like

<data>
<episode-num system="onscreen">S1 E12</episode-num>
<episode-num system="onscreen">S1 S12</episode-num>
<episode-num system="onscreen">T1 S12</episode-num>
</data>

... and that you want to replace all S characters with T in the episode-num node values that start with S.

You do that with xmlstarlet like so:

xmlstarlet ed -u '//episode-num[starts-with(text(),"S")]' \
    -x 'translate(text(),"S","T")' file.xml

This may modify any episode-num node, no matter where in the document these are located. If you only want to modify particular nodes, then change //episode-num in the XPath expression to a more precise path.

Given my example document above, the xmlstarlet command above would produce

<?xml version="1.0"?>
<data>
  <episode-num system="onscreen">T1 E12</episode-num>
  <episode-num system="onscreen">T1 T12</episode-num>
  <episode-num system="onscreen">T1 S12</episode-num>
</data>

Doing the same sort of operation with xq (from https://kislyuk.github.io/yq/) as with xmlstarlet above:

xq -x '(.data."episode-num"[] | select (."#text"|startswith("S")))."#text" |= gsub("S";"T")' file.xml

This assumes that the input document has the same structure as my example document. It parses the document with an XML parser, and then translates it internally into JSON. It calls jq with the generated JSON document to apply the given expression, and finally translates everything back to XML again.

The internal JSON document that the jq expression is actually applied to looks like this, for the example document I'm using:

{
  "data": {
    "episode-num": [
      {
        "@system": "onscreen",
        "#text": "S1 E12"
      },
      {
        "@system": "onscreen",
        "#text": "S1 S12"
      },
      {
        "@system": "onscreen",
        "#text": "T1 S12"
      }
    ]
  }
}

score 2 · Accepted Answer · edited Feb 03 '22 at 18:23

Replace some string only if line contains another string with `sed`

We replace only on lines that contain string free

sed '/free/s/i/I/g' example.txt

'/free/s/i/I/g'
- /free/ - replace line only when contain this string
- s - sed's substitute command
- /i/ - what regular expression we want match
- /I/ - replacement for the matching substrings
- /g - substitution flag, make repeat substitution for all matches on the line

solution for your assumptions

Your test string is <episode-num system="onscreen">

Assume, you have a file with this content:

$ cat test.xml 
<data>
<episode-num system="onscreen">S1 E11</episode-num>
<episode-num system="onscreen">S1 E12</episode-num>
<episode-num system="onscreen">T1 E13</episode-num>
<some data>S1 E1</episode-num>
</data>

Your sed solution is:

$ sed '/<episode-num system="onscreen">/s/S/T/g' test.xml 
<data>
<episode-num system="onscreen">T1 E11</episode-num>
<episode-num system="onscreen">T1 E12</episode-num>
<episode-num system="onscreen">T1 E13</episode-num>
<some data>S1 E1</episode-num>
</data>

Source for this solution is here.

Although this satisfies the letter of the requirement, it's a solution that can break easily if the XML document changes shape (a permitted operation in the XML world) — Chris Davies, Feb 03 '22 at 18:17
@roaima I do not know what "changes shape" means, but if there was some xml like text embedded in a comment then this would incorrectly change that as well. — emory, Feb 04 '22 at 16:08
"changes shape" - imagine removing all the newlines, or adding extra newlines and whitespace (trivially, indenting the <episode-num/> element lines in the example) — Chris Davies, Feb 04 '22 at 16:13

score 2 · Answer 3 · answered Feb 03 '22 at 19:23

A Perl one-ligner is not a good approach. Anyway:

perl -MXML::DT -e 'print dt("ex1.xml", "episode-num" => sub{$c=~ s/S/T/; toxml})'

Where:

-XML::DT = import and use XML::DT module (in this case dt functions
dt( file, processor) = down-translate file with provided processor
episode-num => sub{...} = apply sub to each element episode-sum
$c = s/S/T/ ; toxml = replace S by T in element contents (=$c) and recalculate the episode XML element

(If necessary sudo cpanm XML::DT)

Using sed to replace one character with another within an xml tag

3 Answers3

Replace some string only if line contains another string with sed

solution for your assumptions

Replace some string only if line contains another string with `sed`