replacing xml elements with other elements using lxml

Ultrus owntheweb at gmail.com
Wed Aug 29 19:04:15 EDT 2007


Stefan,
I'm honored by your response.

You are correct about the bad xml. I attempted to shorten the xml for
this example as there are other tags unrelated to this issue in the
mix. Based on your feedback, I was able to make following fully
functional code using some different techniques:

from lxml import etree
from StringIO import StringIO
import random

sourceXml = "\
<theroot>\
 <contents>Stefan's fortune cookie:</contents>\
 <random>\
  <item>\
   <random>\
    <item>\
     <contents>You will always know love.</contents>\
    </item>\
    <item>\
     <contents>You will spend it all in one place.</contents>\
    </item>\
   </random>\
  </item>\
  <item>\
   <contents>Your life comes with a lifetime warrenty.</contents>\
  </item>\
 </random>\
 <contents>The end.</contents>\
</theroot>"

parser = etree.XMLParser(ns_clean=True, recover=True,
remove_blank_text=True, remove_comments=True)
tree = etree.parse(StringIO(sourceXml), parser)
xml = tree.getroot()

def reduceRandoms(xml):
	for elem in xml:
		if elem.tag == "random":
			elem.getparent().replace(elem, random.choice(elem)[0])
			reduceRandoms(xml)

reduceRandoms(xml)
for elem in xml:
	print elem.tag, ":", elem.text




One challenge that I face now is that I can only replace a parent
element with a single element. This isn't a problem if an <item>
element only has 1 <contents> element, or just 1 <random> element
(this works above). However, if <item> elements have more than one
child element such as a <contents> element, followed by a <random>
element (like children of <theroot>), only the first element is used.

Any thoughts on how to replace+append after the replaced element, or
clear+append multiple elements to the cleared position?

Thanks again :)




More information about the Python-list mailing list