bz2.readline() slow ?
Jack Diederich
jackdied at jackdied.com
Mon Nov 27 15:21:42 EST 2006
On Fri, Nov 24, 2006 at 10:11:06AM +0000, Soeren Sonnenburg wrote:
> Dear all,
>
> I am a bit puzzled, as
>
> -----snip-----
> import bz2
> f=bz2.BZ2File('data/data.bz2');
>
> while f.readline():
> pass
> -----snip-----
>
> takes twice the time (10 seconds) to read/decode a bz2 file
> compared to
>
> -----snip-----
> import bz2
> f=bz2.BZ2File('data/data.bz2');
> x=f.readlines()
> -----snip-----
>
> (5 seconds). This is even more strange as the help(bz2) says:
>
> | readlines(...)
> | readlines([size]) -> list
> |
> | Call readline() repeatedly and return a list of lines read.
> | The optional size argument, if given, is an approximate bound on the
> | total number of bytes in the lines returned.
>
> This happens on python2.3 - python2.5 and it does not help to specify a
> maximum line size.
>
> Any ideas ?
The bz2 module is implemented in C so calling "f.readline()" repeatedly
has extra Python => C call overhead that "f.readlines()" doesn't have
because it stays in a tight C loop the whole time.
-Jack
More information about the Python-list
mailing list