10GB XML Blows out Memory, Suggestions?

Fredrik Lundh fredrik at pythonware.com
Tue Jun 6 14:37:32 EDT 2006


K.S.Sreeram wrote:

> There's just NO WAY that the 10gb xml file can be loaded into memory as
> a tree on any normal machine, irrespective of whether we use C or
> Python. So the *only* way is to perform some kind of 'stream' processing
> on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled
> out for this.

both ElementTree and cElementTree support "sax-style" event generation 
(through XMLTreeBuilder/XMLParser) and incremental parsing (through 
iterparse).  the cElementTree versions of these are even faster than 
pyexpat.

the iterparse interface is described here:

     http://effbot.org/zone/element-iterparse.htm

</F>




More information about the Python-list mailing list