Python vs. Java gzip performance

Peter Otten __peter__ at web.de
Fri Mar 17 16:26:00 EST 2006


Caleb Hattingh wrote:

> I tried this:
> 
> from timeit import *
> 
> #Try readlines
> print Timer('import
> gzip;lines=gzip.GzipFile("gztest.txt.gz").readlines();[i+"1" for i in
> lines]').timeit(200) # This is one line
> 
> 
> # Try file object - uses buffering?
> print Timer('import gzip;[i+"1" for i in
> gzip.GzipFile("gztest.txt.gz")]').timeit(200) # This is one line
> 
> Produces:
> 
> 3.90938591957
> 3.98982691765
> 
> Doesn't seem much difference, probably because the test file easily
> gets into memory, and so disk buffering has no effect.   The file
> "gztest.txt.gz" is a gzipped file with 1000 lines, each being "This is
> a test file".

$ python -c"file('tmp.txt', 'w').writelines('%d This is a test\n' % n for n
in range(1000))"
$ gzip tmp.txt

Now, if you follow Martin's advice:

$ python -m timeit -s"from gzip import GzipFile"
"GzipFile('tmp.txt.gz').readlines()"
10 loops, best of 3: 20.4 msec per loop

$ python -m timeit -s"from gzip import GzipFile"
"GzipFile('tmp.txt.gz').read().splitlines(True)"
1000 loops, best of 3: 534 usec per loop

Factor 38. Not bad, I'd say :-)

Peter



More information about the Python-list mailing list