Is there a way to find a pattern (of a child tag) and replace the entire parent tag, using regular expressions? I'm working from a Linux server without a graphics environment.
I have XML like:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
I need a shell script that finds the pattern:
<author>J K. Rowling</author>
then replace its complete block:
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
with:
<book category="CHILDREN">
<title lang="en">Hamlet</title>
<author>William Shakespeare</author>
</book>
to finally get:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Hamlet</title>
<author>William Shakespeare</author>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
with something like <book*<author>J K. Rowling</author>*</book>
,
where *
is a wildcard for all text or code between <book
and <author>...
I have an idea, using Perl, contemplating these logic steps:
- Search the line number where the pattern is
- Identify the line number of parent block open and close tags
- Remove all this content, inside these lines.
- Add the new block inside these lines
But, it is possible, I prefer the first approach.
sed
, including the accepted answer, which was written by you. – G-Man Says 'Reinstate Monica' Dec 30 '21 at 14:56awk
, Perl and XMLStarlet ones, shows how to do what this question asks for: *replace an entire block.* – G-Man Says 'Reinstate Monica' Dec 30 '21 at 14:56J.K. Rowling
(as presented on *her* web site),J. K. Rowling
(preferred by Wikipedia),JK Rowling
,J K Rowling
, or any other variant? If you want to handle pattern-matching (and, perhaps, correction / normalization / standardization) in the author field, you should say so explicitly. Otherwise, you might want to use a less problematic example, likeJohn Grisham
. … (Cont’d) – G-Man Says 'Reinstate Monica' Dec 30 '21 at 15:24