xml.parsers.expat loading xml into a dict and whitespace

kaens apatheticagnostic at gmail.com
Wed May 23 01:53:29 EDT 2007


Hey everyone, this may be a stupid question, but I noticed the
following and as I'm pretty new to using xml and python, I was
wondering if I could get an explanation.

Let's say I write a simple xml parser, for an xml file that just loads
the content of each tag into a dict (the xml file doesn't have
multiple hierarchies in it, it's flat other than the parent node)

so we have
<parent>
     <option1>foo</option1>
     <option2>bar</option2>
      . . .
</parent>

(I'm using xml.parsers.expat)
the parser sets a flag that says it's in the parent, and sets the
value of the current tag it's processing in the start tag handler.
The character data handler sets a dictionary value like so:

dictName[curTag] = data

after I'm done processing the file, I print out the dict, and the first value is
<a few bits of whitespace> : <a whole bunch of whitespace>

There are comments in the xml file - is this what is causing this?
There are also blank lines. .  .but I don't see how a blank line would
be interpreted as a tag. Comments though, I could see that happening.

Actually, I just did a test on an xml file that had no comments or
whitespace and got the same behaviour.

If I feed it the following xml file:

<options>
<one>hey</one>
<two>bee</two>
<three>eff</three>
</options>

it prints out:
" :

three :  eff
two :  bee
one :  hey"

wtf.

For reference, here's the handler functions:

def handleCharacterData(self, data):
     if self.inOptions and self.curTag != "options":
         self.options[self.curTag] = data

def handleStartElement(self, name, attributes):
    if name == "options":
        self.inOptions = True
    if self.inOptions:
        self.curTag = name


def handleEndElement(self, name):
    if name == "options":
        self.inOptions = False
    self.curTag = ""

Sorry if the whitespace in the code got mangled (fingers crossed...)



More information about the Python-list mailing list