Python and XML help

Tom Bryan tbryan at python.net
Sat Jul 27 11:05:26 EDT 2002


Mathieu wrote:

> I'm new to Python and having some problems parsing the following XML :

Suggestion: 
You'll get more responses if the code you post will run or will only hit 
errors that you're asking about in your post.  If you're hitting an 
Exception, then you should include the Python stacktrace in your post.  
Often, we'll then be able to answer your question without even running your 
code.
 
> <properties name="spam">
> <prop>
> <first>
>  <item name="foo">

This document isn't even well-formed XML.  It should be 
<item name="foo"></item> or 
<item name="foo" /> 

>  <item name="foo2">
> </first>
> 
> <second>
>  <item name="bar">
>  <item name="bar2">
> </second>
> </prop>

You also need a closing properties tag
</properties>

> if __name__ == "__main__":
>     p=myparser()

I'm assuming you left out something like 
data = open( "properties.xml" ).read()

>     p.feed(data)
>     p.close()
 
> I would like the script to write only the "name" variables that are
> within the "first" brackets and ignore the others. Please note that my
> code must stay compatible with Python 1.5 and that this example might
> not work or contain errors, because it is a simplified version of the
> code I have written.

I'm more familiar with SAX parsers in Java, but Python's XMLParser looks 
similar.  Generally, I would say that you should just keep a stack of 
element names that your parser has seen.  Then you can peek at the stack to 
see where you are.  I see that XMLParser seems to maintain a stack, but I'm 
not sure whether you're really supposed to use it.  That is, perhaps it 
isn't guaranteed to exist in future versions.  Look at xmllib.py in your 
Python distribution and see what the comments to XMLParser say.

Anyway, using the stack inherited from XMLParser, here's how I would do what 
you're saying

    def start_item(self,attrs):
        # Look for "item" elements only within "first" elements
        # and print the value of the "name" attribute
        if self.stack[-1][0] == 'item' and self.stack[-2][0] == 'first':
            print attrs['name']

---Tom




More information about the Python-list mailing list