10GB XML Blows out Memory, Suggestions?

Diez B. Roggisch deets at nospam.web.de
Wed Jun 7 12:11:13 EDT 2006


fuzzylollipop wrote:

> 
> Fredrik Lundh wrote:
>> fuzzylollipop wrote:
>>
>> > you got no idea what you are talking about, anyone knows that something
>> > like this is IO bound.
>>
>> which of course explains why some XML parsers for Python are a 100 times
>> faster than other XML parsers for Python...
>>
> 
> dependes on the CODE and the SIZE of the file, in this case
> 
> processing 10GB of file, unless that file is heavly encrypted or
> compressed will, the process will be IO bound PERIOD!

Why so? IO-bounds will be hit when the processing of the fetched data is
faster than the fetching itself. So if I decide to read 10GB a 4Kb block
per second, I'm possibly a very patient fella, but no IO-bounds are hit. So
no PERIOD here - without talking about _what_ actually happens. 

> Anyone saying that using C instead of Python will be faster when 99% of
> the time in this case is just waiting on the disk to feed a buffer, has
> no idea what they are talking about.

Which is true - but the chances for C performing whatever I want to in the
1% of time are a few times better than to do so in Python.

Mind you: I don't argue that the statements of Mr. Sreeram are true, either.
This discussion can only be hold with respect to the actual use case (which
is certainly more that just parsing XML, but also processing it)

> I work with TeraBytes of files, and all our Python code is just as fast
> as equivelent C code for IO bound processes.

Care to share what kind of processing you perfrom on these files?

Regards,

Diez




More information about the Python-list mailing list