[Tutor] parsing XML into a python dictionary
Christopher Spears
cspears2002 at yahoo.com
Sat Nov 14 07:14:09 CET 2009
I've been working on a way to parse an XML document and convert it into a python dictionary. I want to maintain the hierarchy of the XML. Here is the sample XML I have been working on:
<collection>
<comic title="Sandman" number='62'>
<writer>Neil Gaiman</writer>
<penciller pages='1-9,18-24'>Glyn Dillon</penciller>
<penciller pages="10-17">Charles Vess</penciller>
</comic>
</collection>
This is my first stab at this:
#!/usr/bin/env python
from lxml import etree
def generateKey(element):
if element.attrib:
key = (element.tag, element.attrib)
else:
key = element.tag
return key
class parseXML(object):
def __init__(self, xmlFile = 'test.xml'):
self.xmlFile = xmlFile
def parse(self):
doc = etree.parse(self.xmlFile)
root = doc.getroot()
key = generateKey(root)
dictA = {}
for r in root.getchildren():
keyR = generateKey(r)
if r.text:
dictA[keyR] = r.text
if r.getchildren():
dictA[keyR] = r.getchildren()
newDict = {}
newDict[key] = dictA
return newDict
if __name__ == "__main__":
px = parseXML()
newDict = px.parse()
print newDict
This is the output:
163>./parseXML.py
{'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [<Element writer at -482193f4>, <Element penciller at -482193cc>, <Element penciller at -482193a4>]}}
The script doesn't descend all of the way down because I'm not sure how to hand a XML document that may have multiple layers. Advice anyone? Would this be a job for recursion?
Thanks!
More information about the Tutor
mailing list