Python vs. Java gzip performance

"Martin v. Löwis" martin at v.loewis.de
Fri Mar 17 14:08:16 EST 2006


Bill wrote:
> The Java version of this code is roughly 2x-3x faster than the Python
> version.  I can get around this problem by replacing the Python
> GzipFile object with a os.popen call to gzcat, but then I sacrifice
> portability.  Is there something that can be improved in the Python
> version?

Don't use readline/readlines. Instead, read in larger chunks, and break
it into lines yourself. For example, if you think the entire file should
fit into memory, read it at once.

If that helps, try editing gzip.py to incorporate that approach.

Regards,
Martin



More information about the Python-list mailing list