Splitting a DOM

Fri Feb 13 10:37:59 EST 2004

brice.vissiere at costes-gestion.net (Brice Vissi?re) wrote in message news:<fa538331.0402120759.44f20301 at posting.google.com>...
> Hello,
> 
> I would like to handle an XML file structured as following
> <ROOT>
> <STEP>
> ...
> </STEP>
> <STEP>
> ...
> </STEP>
> ...
> </ROOT>
> 
> From this file, I want to build an XML file for each STEP block.
> 
> Currently I'm doing something like:
> 
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import PrettyPrint
> 
> reader = Sax2.Reader()
> my_dom = reader.fromUri('steps.xml')
> steps = my_dom.getElementsByTagName('STEP')
> 
> i=0
> for step in steps:
> 	tmp = file('step%s.xml' % i,'w')
> 	tmp.write('<?xml version="1.0" encoding="ISO-8859-1" ?>\n')
> 	PrettyPrint(step , tmp , encoding='ISO-8859-1')
> 	tmp.close()
> 	i+=1
> 
> But I'm pretty sure that there's a better way to split the DOM ?

I already gave an Aobind recipe foir this one, but I wanted to also
post a few notes on your chosen approach:

1) "from xml.dom.ext.reader import Sax2" means you're using 4DOM. 
4DOM is very slow.  If you find this is a problem, use minidom.  My
aob ind recipe used cDomlette, which is *very* fast, and even faster
than minidom, certainly, but requires installing 3rd party software.

2) "steps = my_dom.getElementsByTagName('STEP')".  This could give
unexpected results in the case that you have nested STEP elements. 
You might want to use a list comprehension such as

steps = [ step for step in my_dom.documentElement.childNodes if
step.nodeName == u"STEP" ]

Good luck.

--Uche
http://uche.ogbuji.net