1

I am using python beautiful soup to parse an XML file and write it to a different file after deletion of certain tags. But, using soup.prettify changes other XML namespaces and attribute names.

f = open('new.xml',"w"); f.write(soup.prettify(formatter="xml")); f.close();

The changes are as given in sample below.

Original XML file.

<draw:control text:anchor-type="paragraph" draw:z-index="1" draw:style-name="gr1" draw:text-style-name="P2" svg:width="2.805cm" svg:height="1.853cm" svg:x="3.602cm" svg:y="0.824cm" draw:control="control2"/>

New XML file written from soup.prettify.

  <draw:control draw:control="control2" draw:style-name="gr1" draw:text-style-name="P2" draw:z-index="1" svg:height="1.853cm" svg:width="2.805cm" svg:x="3.602cm" svg:y="0.824cm" text:anchor-type="paragraph"/>

I tried adding utf-8 to prettify(). But, its the same problem. Is there any other method to delete a particular tag based on searching and keep all the other XML contents in the file in tact? Please suggest.

Akhitha
  • 43
  • 1
  • 2
  • 6

1 Answers1

3

Consider using native xml.etree.ElementTree module which implements a simple and efficient API for parsing and creating XML data. Its faster, better, easier and pythonic.

You can remove a particular element using Element.remove().

A basic example is given here.

But if you insist on using BeautifulSoup (it uses lxml, a enhanced version of native py module) , you can

# beautifulstonesoup for XML parsing
from BeautifulSoup import BeautifulStoneSoup 

xml_data = """
<draw:control text:anchor-type="paragraph" draw:z-index="1" draw:style-name="gr1" draw:text-style-name="P2" svg:width="2.805cm" svg:height="1.853cm" svg:x="3.602cm" svg:y="0.824cm" draw:control="control2"/>
"""
soup = BeautifulStoneSoup(xml_data)
print soup.prettify()
soup.find(<your tag/element).replaceWith(<whateveryouwant>)

You can also use a for loop for editing multiple similar elements as well.

delta24
  • 954