10GB XML Blows out Memory, Suggestions?
John J. Lee
jjlee at reportlab.com
Tue Jun 6 16:11:42 EDT 2006
"K.S.Sreeram" <sreeram at tachyontech.net> writes:
[...]
> There's just NO WAY that the 10gb xml file can be loaded into memory as
> a tree on any normal machine, irrespective of whether we use C or
> Python.
Yes.
> So the *only* way is to perform some kind of 'stream' processing
> on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled
> out for this.
No, that's not true. I guess you didn't read the other posts:
http://effbot.org/zone/element-iterparse.htm
> Diez B. Roggisch wrote:
> > No what exactly makes C grok a 10Gb file where python will fail to do so?
>
> In most typical cases where there's any kind of significant python code,
> its possible to achieve a *minimum* of a 10x speedup by using C. In most
[...]
I don't know where you got that from. And in this particular case, of
course, cElementTree *is* written in C, there's presumably plenty of
"significant python code" around since, one assumes, *all* of the OP's
code is written in Python (does that count as "any kind" of Python
code?), and yet rewriting something in C here may not make much
difference.
John
More information about the Python-list
mailing list