[Tutor] XML parsing

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 6 Feb 2001 22:04:30 -0800 (PST)


On Wed, 7 Feb 2001, Suzanne Little wrote:

> Which module should I be using to do this? Are there any examples of
> this sort of scanning-of-xml-documents-for-information available for
> me to look at?

As a side project, I'm beginning to study the expat parser; it's pretty
neat.  Here's a small example that uses Expat:

###
class MyXMLParser2:
    def __init__(self):
        self.parser = expat.ParserCreate()
        self.parser.StartElementHandler = self.StartElementHandler
        self.parser.EndElementHandler = self.EndElementHandler
        self.parser.CharacterDataHandler = self.CharacterDataHandler

    def feed(self, str):
        self.parser.Parse(str)

    def StartElementHandler(self, name, attributes):
        print "Starting: ", name, attributes
        
    def EndElementHandler(self, name):
        print "Ending: ", name
        
    def CharacterDataHandler(self, data):
        print "Character data:", data

def test():
    p = MyXMLParser2()
    p.feed("""
<iq id='A0' type='get'><query
xmlns='jabber:iq:auth'><paragraph><username>bbaggins<boldface>Bilbo</boldface>
Baggins</username></paragraph></query></iq>
    """)

if __name__ == '__main__':
    test()
###

The idea is that whenever we let our parser look at something, it will
"call back" functions whenever it sees something that interests us.  For
example, as soon as the parser sees:

    <iq id='A0' type='get'>

it realizes that it sees the start of a new tag, so that's when the
StartElementHandler callback executes.  Similar things happen when it sees
an end tag or character data.  Try playing around with the program above,
and it should make things more clear.


There's some documentation about Expat here:

    http://python.org/doc/current/lib/module-xml.parsers.expat.html

but it is, admittedly, a little terse.  If I find anything more
accessible, I'll post to the list again.  Good luck!