10GB XML Blows out Memory, Suggestions?

gregarican greg.kujawa at gmail.com
Wed Jun 7 11:00:37 EDT 2006


Am I missing something? I don't read where the poster mentioned the
operation as being CPU intensive. He does mention that the entirety of
a 10 GB file cannot be loaded into memory. If you discount physical
swapfile paging and base this assumption on a "normal" PC that might
have maybe 1 or 2 GB of RAM is his assumption that out of line?

And I don't doubt that Python is efficient as possible for I/O
operations. But since it is an interpreted scripting language how could
it be "just as fast as any language" as you claim? C would have to be
faster. Machine language would have to be faster. And even other
interpreted languages *could* be faster, given certain conditions. A
generalization like the claim kind of invalidates the remainder of your
assertion.

fuzzylollipop wrote:
> K.S.Sreeram wrote:
> > Diez B. Roggisch wrote:
> > > What the OP needs is a different approach to XML-documents that won't
> > > parse the whole file into one giant tree - but I'm pretty sure that
> > > (c)ElementTree will do the job as well as expat. And I don't recall the
> > > OP musing about performances woes, btw.
> >
> >
> > There's just NO WAY that the 10gb xml file can be loaded into memory as
> > a tree on any normal machine, irrespective of whether we use C or
> > Python. So the *only* way is to perform some kind of 'stream' processing
> > on the file. Perhaps using a SAX like API. So (c)ElementTree is ruled
> > out for this.
> >
> > Diez B. Roggisch wrote:
> > > No what exactly makes C grok a 10Gb file where python will fail to do so?
> >
> > In most typical cases where there's any kind of significant python code,
> > its possible to achieve a *minimum* of a 10x speedup by using C. In most
> > cases, the speedup is not worth it and we just trade it for the
> > increased flexiblity/power of the python language. But in this situation
> > using a bit of tight C code could make the difference between the
> > process taking just 15mins or taking a few hours!
> >
> > Ofcourse I'm not asking him to write the entire application in C. It
> > makes sense to just write the performance critical sections in C, and
> > wrap it in Python, and write the rest of the application in Python.
>
>
> you got no idea what you are talking about, anyone knows that something
> like this is IO bound.
> CPU is the least of his worries. And for IO bound applications Python
> is just as fast as any other language.




More information about the Python-list mailing list