[Tutor] Read XML

Wed Dec 7 00:31:37 CET 2005

On Tue, 6 Dec 2005, Joseph Quigley wrote:

> How do I read xml? The python documentation doesn't help me. Or, how can
> I remove the (tags?) <stuff in these things>?

Hi Joseph,

The modules in the standard library for XML reading should be functional.
For example, here is some 'xml.dom.minidom' example code to show how we
can parse and extract stuff out of XML text:

######
>>> import xml.dom.minidom
>>> xml.dom.minidom.parseString
<function parseString at 0x825ce9c>
>>>
>>> tree = xml.dom.minidom.parseString("<hello><world>test</world></hello>")
>>> tree
<xml.dom.minidom.Document instance at 0x825344c>
>>>
>>>
>>> tree.getElementsByTagName('world')
[<DOM Element: world at 0x82537ac>]
>>> worldNode = tree.getElementsByTagName('world')[0]
>>> worldNode
<DOM Element: world at 0x82537ac>
>>>
>>> worldNode.firstChild
<DOM Text node "test">
>>> worldNode.firstChild.data
u'test'
######

There's a larger example of minidom here:

    http://www.python.org/doc/lib/dom-example.html

But to tell the truth; I don't like minidom too much these days.  *grin*
The code above is a bit verbose, and the usage of the standard Python XML
parsers is a little less than obvious because much of the design was
borrowed from Java's SAX and DOM parsers.

You might want to take a look at some third-party modules like ElementTree
or Amara instead.

    http://effbot.org/zone/element-index.htm
    http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/

I don't have too much experience with Amara, and I hope someone else can
give an example with it.  With ElementTree, the code to grab the 'test'
text looks like this:

######
>>> from elementtree import ElementTree
>>> tree = ElementTree.fromstring("<hello><world>test</world></hello>")
>>> tree.findtext('world')
'test'
######

Good luck!