[Tutor] parsing XML into a python dictionary

Christopher Spears cspears2002 at yahoo.com
Sat Nov 14 07:14:09 CET 2009


I've been working on a way to parse an XML document and convert it into a python dictionary.  I want to maintain the hierarchy of the XML.  Here is the sample XML I have been working on:

<collection>
  <comic title="Sandman" number='62'>
    <writer>Neil Gaiman</writer>
    <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
    <penciller pages="10-17">Charles Vess</penciller>
  </comic>
</collection>

This is my first stab at this:

#!/usr/bin/env python

from lxml import etree

def generateKey(element):
    if element.attrib:
        key = (element.tag, element.attrib)
    else:
	key = element.tag
    return key	

class parseXML(object):
    def __init__(self, xmlFile = 'test.xml'):
        self.xmlFile = xmlFile
	
    def parse(self):
        doc = etree.parse(self.xmlFile)
	root = doc.getroot()
	key = generateKey(root)
	dictA = {}
	for r in root.getchildren():
	    keyR = generateKey(r)
	    if r.text:
	        dictA[keyR] = r.text
	    if r.getchildren():
	        dictA[keyR] = r.getchildren()
		
	newDict = {}
	newDict[key] = dictA
	return newDict
	        	
if __name__ == "__main__":
    px = parseXML()
    newDict = px.parse()
    print newDict
	
This is the output:
163>./parseXML.py
{'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [<Element writer at -482193f4>, <Element penciller at -482193cc>, <Element penciller at -482193a4>]}}

The script doesn't descend all of the way down because I'm not sure how to hand a XML document that may have multiple layers.  Advice anyone?  Would this be a job for recursion?

Thanks!


More information about the Tutor mailing list