xml processing speed test

Fredrik Lundh fredrik at pythonware.com
Wed Jun 7 14:30:42 EDT 2006


K.S.Sreeram wrote:

> From what i understand, the iterparse interface constructs the xml tree,
> but gives you hooks into the tree construction process itself, so that
> the programmer can control how much state he wants to retain and how
> much state he can discard.
> 
> I wanted the test program to maintain as little state as possible, so
> i'm discarding all state at the earliest.

which means that your program is doing a lot more work than it has to 
do: instead of using the data structure iterparse is providing, you're 
building your own parallel data structure instead.

> So can you tell me how i can use iterparse more effeciently?

by using it to split your document into reasonably-sized chunks (one 
record, one expression, one text block, one paragraph, etc), and using 
Python code to process the chunks.

if you're not interested in iterparse's tree-building functionality, use 
the bare parser interface instead (XMLParser).

</F>




More information about the Python-list mailing list