Splitting a DOM
Uche Ogbuji
uche at ogbuji.net
Fri Feb 13 10:37:59 EST 2004
brice.vissiere at costes-gestion.net (Brice Vissi?re) wrote in message news:<fa538331.0402120759.44f20301 at posting.google.com>...
> Hello,
>
> I would like to handle an XML file structured as following
> <ROOT>
> <STEP>
> ...
> </STEP>
> <STEP>
> ...
> </STEP>
> ...
> </ROOT>
>
> From this file, I want to build an XML file for each STEP block.
>
> Currently I'm doing something like:
>
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import PrettyPrint
>
> reader = Sax2.Reader()
> my_dom = reader.fromUri('steps.xml')
> steps = my_dom.getElementsByTagName('STEP')
>
> i=0
> for step in steps:
> tmp = file('step%s.xml' % i,'w')
> tmp.write('<?xml version="1.0" encoding="ISO-8859-1" ?>\n')
> PrettyPrint(step , tmp , encoding='ISO-8859-1')
> tmp.close()
> i+=1
>
> But I'm pretty sure that there's a better way to split the DOM ?
I already gave an Aobind recipe foir this one, but I wanted to also
post a few notes on your chosen approach:
1) "from xml.dom.ext.reader import Sax2" means you're using 4DOM.
4DOM is very slow. If you find this is a problem, use minidom. My
aob ind recipe used cDomlette, which is *very* fast, and even faster
than minidom, certainly, but requires installing 3rd party software.
2) "steps = my_dom.getElementsByTagName('STEP')". This could give
unexpected results in the case that you have nested STEP elements.
You might want to use a list comprehension such as
steps = [ step for step in my_dom.documentElement.childNodes if
step.nodeName == u"STEP" ]
Good luck.
--Uche
http://uche.ogbuji.net
More information about the Python-list
mailing list