10GB XML Blows out Memory, Suggestions?

fuzzylollipop jarrod.roberson at gmail.com
Tue Jun 6 22:43:07 EDT 2006


K.S.Sreeram wrote:
> Diez B. Roggisch wrote:
> > What the OP needs is a different approach to XML-documents that won't
> > parse the whole file into one giant tree - but I'm pretty sure that
> > (c)ElementTree will do the job as well as expat. And I don't recall the
> > OP musing about performances woes, btw.
>
>
> There's just NO WAY that the 10gb xml file can be loaded into memory as
> a tree on any normal machine, irrespective of whether we use C or
> Python. So the *only* way is to perform some kind of 'stream' processing
> on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled
> out for this.
>
> Diez B. Roggisch wrote:
> > No what exactly makes C grok a 10Gb file where python will fail to do so?
>
> In most typical cases where there's any kind of significant python code,
> its possible to achieve a *minimum* of a 10x speedup by using C. In most
> cases, the speedup is not worth it and we just trade it for the
> increased flexiblity/power of the python language. But in this situation
> using a bit of tight C code could make the difference between the
> process taking just 15mins or taking a few hours!
>
> Ofcourse I'm not asking him to write the entire application in C. It
> makes sense to just write the performance critical sections in C, and
> wrap it in Python, and write the rest of the application in Python.


you got no idea what you are talking about, anyone knows that something
like this is IO bound.
CPU is the least of his worries. And for IO bound applications Python
is just as fast as any other language.




More information about the Python-list mailing list