parsing nested unbounded XML fields with ElementTree

Neil Cerutti mr.cerutti at gmail.com
Tue Nov 26 10:27:01 EST 2013


On Mon, Nov 25, 2013 at 5:22 PM, Larry.Martell at gmail.com
<Larry.Martell at gmail.com> wrote:
> I have an XML file that has an element called "Node". These can
> be nested to any depth and the depth of the nesting is not
> known to me. I need to parse the file and preserve the nesting.
> For exmaple, if the XML file had:
>
> <Node Name="A">
>    <Node Name="B">
>       <Node Name="C">
>         <Node Name="D">
>           <Node Name="E">
>
> When I'm parsing Node "E" I need to know I'm in A/B/C/D/E.
> Problem is I don't know how deep this can be. This is the code
> I have so far:

I also an ElementTree user, but it's fairly heavy-duty for simple
jobs. I use sax for simple those. In fact, I'm kind of a saxophone.
This is basically the same idea as others have posted.

the_xml = """<?xml version="1.0" encoding="ISO-8859-1"?>
<Node Name="A">
   <Node Name="B">
      <Node Name="C">
        <Node Name="D">
          <Node Name="E">
          </Node></Node></Node></Node></Node>"""
import io
import sys
import xml.sax as sax


class NodeHandler(sax.handler.ContentHandler):
    def startDocument(self):
        self.title = ''
        self.names = []

    def startElement(self, name, attrs):
        self.process(attrs['Name'])
        self.names.append(attrs['Name'])

    def process(self, name):
        print("Node {} Nest {}".format(name, '/'.join(self.names)))
        # Do your stuff.

    def endElement(self, name):
        self.names.pop()


print(sys.version_info)
handler = NodeHandler()
parser = sax.parse(io.StringIO(the_xml), handler)

Output:
sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0)
Node A Nest
Node B Nest A
Node C Nest A/B
Node D Nest A/B/C
Node E Nest A/B/C/D

-- 
Neil Cerutti



More information about the Python-list mailing list