Splitting a DOM
Alan Kennedy
alanmk at hotmail.com
Thu Feb 12 13:53:38 EST 2004
[Brice Vissi?re]
> But I'm pretty sure that there's a better way to split the DOM ?
There's *lots* of ways to solve this one. The "best" solution depends
on which criteria you choose.
The most efficient in time and memory is probably SAX, although the
problem is so simple, a simple textual solution might work well, and
would definitely be faster.
Here's a bit of SAX code adapted from another SAX example I posted
earlier today. Note that this will not work properly if you have
<STEP> elements nested inside one another. In that case, you'd have to
maintain a stack of the output files: push the outfile onto the stack
in "startElement()" and pop it off in "endElement()".
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
import xml.sax
from xml.sax.saxutils import escape, quoteattr
import cStringIO as StringIO
split_on_elems = ['STEP']
class splitter(xml.sax.handler.ContentHandler):
def __init__(self):
xml.sax.handler.ContentHandler.__init__(self)
self.outfile = None
self.seq_no = self.seq_no_gen()
def seq_no_gen(self, n=0):
while True: yield n ; n = n+1
def startElement(self, elemname, attrs):
if elemname in split_on_elems:
self.outfile = open('step%04d.xml' % self.seq_no.next(), 'wt')
if self.outfile:
attrstr = ""
for a in attrs.keys():
attrstr = "%s%s" % (attrstr, " %s=%s" % (a,
quoteattr(attrs[a])))
self.outfile.write("<%s%s>" % (elemname, attrstr))
def endElement(self, elemname):
if self.outfile: self.outfile.write('</%s>' % elemname)
if elemname in split_on_elems:
self.outfile.close() ; self.outfile = None
def characters(self, s):
if self.outfile: self.outfile.write("%s" % (s,))
testdoc = """
<ROOT>
<STEP a="b" c="d">Step 0</STEP>
<STEP>Step 1</STEP>
<STEP>Step 2</STEP>
<STEP>Step 3</STEP>
<STEP>Step 4</STEP>
</ROOT>
"""
if __name__ == "__main__":
parser = xml.sax.make_parser()
PFJ = splitter()
parser.setContentHandler(PFJ)
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
parser.feed(testdoc)
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
HTH,
--
alan kennedy
------------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/contact/alan
More information about the Python-list
mailing list