10GB XML Blows out Memory, Suggestions?

John J. Lee jjlee at reportlab.com
Tue Jun 6 16:11:42 EDT 2006


"K.S.Sreeram" <sreeram at tachyontech.net> writes:
[...]
> There's just NO WAY that the 10gb xml file can be loaded into memory as
> a tree on any normal machine, irrespective of whether we use C or
> Python.

Yes.

> So the *only* way is to perform some kind of 'stream' processing
> on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled
> out for this.

No, that's not true.  I guess you didn't read the other posts:

http://effbot.org/zone/element-iterparse.htm


> Diez B. Roggisch wrote:
> > No what exactly makes C grok a 10Gb file where python will fail to do so?
> 
> In most typical cases where there's any kind of significant python code,
> its possible to achieve a *minimum* of a 10x speedup by using C. In most
[...]

I don't know where you got that from.  And in this particular case, of
course, cElementTree *is* written in C, there's presumably plenty of
"significant python code" around since, one assumes, *all* of the OP's
code is written in Python (does that count as "any kind" of Python
code?), and yet rewriting something in C here may not make much
difference.


John



More information about the Python-list mailing list