parsing nested unbounded XML fields with ElementTree
Neil Cerutti
mr.cerutti at gmail.com
Tue Nov 26 10:27:01 EST 2013
On Mon, Nov 25, 2013 at 5:22 PM, Larry.Martell at gmail.com
<Larry.Martell at gmail.com> wrote:
> I have an XML file that has an element called "Node". These can
> be nested to any depth and the depth of the nesting is not
> known to me. I need to parse the file and preserve the nesting.
> For exmaple, if the XML file had:
>
> <Node Name="A">
> <Node Name="B">
> <Node Name="C">
> <Node Name="D">
> <Node Name="E">
>
> When I'm parsing Node "E" I need to know I'm in A/B/C/D/E.
> Problem is I don't know how deep this can be. This is the code
> I have so far:
I also an ElementTree user, but it's fairly heavy-duty for simple
jobs. I use sax for simple those. In fact, I'm kind of a saxophone.
This is basically the same idea as others have posted.
the_xml = """<?xml version="1.0" encoding="ISO-8859-1"?>
<Node Name="A">
<Node Name="B">
<Node Name="C">
<Node Name="D">
<Node Name="E">
</Node></Node></Node></Node></Node>"""
import io
import sys
import xml.sax as sax
class NodeHandler(sax.handler.ContentHandler):
def startDocument(self):
self.title = ''
self.names = []
def startElement(self, name, attrs):
self.process(attrs['Name'])
self.names.append(attrs['Name'])
def process(self, name):
print("Node {} Nest {}".format(name, '/'.join(self.names)))
# Do your stuff.
def endElement(self, name):
self.names.pop()
print(sys.version_info)
handler = NodeHandler()
parser = sax.parse(io.StringIO(the_xml), handler)
Output:
sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0)
Node A Nest
Node B Nest A
Node C Nest A/B
Node D Nest A/B/C
Node E Nest A/B/C/D
--
Neil Cerutti
More information about the Python-list
mailing list