10GB XML Blows out Memory, Suggestions?

K.S.Sreeram sreeram at tachyontech.net
Tue Jun 6 15:05:50 EDT 2006


Fredrik Lundh wrote:
> both ElementTree and cElementTree support "sax-style" event generation 
> (through XMLTreeBuilder/XMLParser) and incremental parsing (through 
> iterparse).  the cElementTree versions of these are even faster than 
> pyexpat.
> 
> the iterparse interface is described here:
> 
>      http://effbot.org/zone/element-iterparse.htm
> 
Thats cool! Thanks for the info!

For a multi-gigabyte file, I would still recommend C/C++, because the
processing code which sits on top of the XML library needs to be Python,
and that could turn out to be a significant overhead in such extreme cases.

Of course, the exact strategy to follow would depend on the specifics of
the case, and all this speculation may not really apply! :)

Regards
Sreeram

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20060607/4fa778cd/attachment.sig>


More information about the Python-list mailing list