10GB XML Blows out Memory, Suggestions?

Fredrik Lundh fredrik at pythonware.com
Thu Jun 8 09:50:49 EDT 2006


fuzzylollipop wrote:

> SAX style or a pull-parser has to be used when the data is "large" or
> when you don't really need to process every element and attribute.
> 
> This problem looks like it is just a data export / import problem. In
> that case you will either have to use a sax style parser and parse the
> 10GB file. Or as I suggested in another reply, export the data in
> smaller chunks

or use a parser that can do the chunking for you, on the way in...

in Python, incremental parsers like cET's iterparse and the one in Amara 
gives you *better* performance than SAX (including "raw" pyexpat) in 
many cases, and offers a much simpler programming model.

</F>




More information about the Python-list mailing list