10GB XML Blows out Memory, Suggestions?

Wed Jun 7 12:01:19 EDT 2006

Thanks guys for all your posts...

So I am a bit confused....Fuzzy, the code I saw looks like it
decompresses as a stream (i.e. per byte). Is this the case or are you
just compressing for file storage but the actual data set has to be
exploded in memory?

fuzzylollipop wrote:
> Fredrik Lundh wrote:
> > fuzzylollipop wrote:
> >
> > > you got no idea what you are talking about, anyone knows that something
> > > like this is IO bound.
> >
> > which of course explains why some XML parsers for Python are a 100 times
> > faster than other XML parsers for Python...
> >
>
> dependes on the CODE and the SIZE of the file, in this case
>
> processing 10GB of file, unless that file is heavly encrypted or
> compressed will, the process will be IO bound PERIOD!
>
> And in the case of XML unless the PARSER is extremely inefficient, and
> I assume, that would be an edge case, the parser is NOT the bottle neck
> in this case.
>
> The relativel performance of Python XML parsers is irrelvant in
> relationship to this being an IO bound process, even the slowest parser
> could only process the data as fast as it can be read off the disk.
>
> Anyone saying that using C instead of Python will be faster when 99% of
> the time in this case is just waiting on the disk to feed a buffer, has
> no idea what they are talking about.
>
> I work with TeraBytes of files, and all our Python code is just as fast
> as equivelent C code for IO bound processes.