Question about XML Parser in Python.

Fredrik Lundh fredrik at pythonware.com
Wed May 11 12:36:42 EDT 2005


Amitpython5 at aol.com wrote:

>   Well, I'm fairly new to Python and have encountered a strange error  while
> reading an XML document in Python. I used the SAX parser, and my input XML  is
> fairly large with 300000 records. I extract about 25 fields from each record
> and spit out a csv file. The strange thing is that after about 2000 records,
> some value (one of the 25) is missing in the csv file, so it just appears as
> ',,', as if the value was missing from the Input file. I checked the Input
> file  and all values are intact.

you're aware that you can get multiple calls to the character data handlers
for each character data section?  (in other words, you cannot just use the
text you get in the first call; you have to collect text sections until you see
an end tag).

alternatively, you can forget about SAX and use a better tool.  tools that
let you iterate over subtrees are a lot easier to use, and can be faster too.

here's the elementtree solution:

    from elementtree.ElementTree import iterparse

    for event, elem in iterparse(source):
        if elem.tag == "record":
            # ... process record elements ...
            elem.clear()

    http://effbot.org/zone/element-iterparse.htm

(if speed is important, the cElementTree implementation of iterparse is ~5x
faster than Python's standard SAX library)

</F>






More information about the Python-list mailing list