bz2.readline() slow ?

Jack Diederich jackdied at jackdied.com
Mon Nov 27 15:21:42 EST 2006


On Fri, Nov 24, 2006 at 10:11:06AM +0000, Soeren Sonnenburg wrote:
> Dear all,
> 
> I am a bit puzzled, as
> 
> -----snip-----
> import bz2
> f=bz2.BZ2File('data/data.bz2');
> 
> while f.readline():
>         pass
> -----snip-----
> 
> takes twice the time (10 seconds) to read/decode a bz2 file
> compared to
> 
> -----snip-----
> import bz2
> f=bz2.BZ2File('data/data.bz2');
> x=f.readlines()
> -----snip-----
> 
> (5 seconds). This is even more strange as the help(bz2) says:
> 
>      |  readlines(...)
>      |      readlines([size]) -> list
>      |      
>      |      Call readline() repeatedly and return a list of lines read.
>      |      The optional size argument, if given, is an approximate bound on the
>      |      total number of bytes in the lines returned.
> 
> This happens on python2.3 - python2.5 and it does not help to specify a
> maximum line size.
> 
> Any ideas ?

The bz2 module is implemented in C so calling "f.readline()" repeatedly
has extra Python => C call overhead that "f.readlines()" doesn't have
because it stays in a tight C loop the whole time.

-Jack



More information about the Python-list mailing list