152 is faster that 221 ? I think not ...

Tue Aug 28 06:55:56 EDT 2001

On Tue, 28 Aug 2001, Dave Harrison wrote:

> Im writing a script that goes through and reads zipped log.  So far so cool.
>
> Now the current install on the server is 1.5.2 which is nice, but I was hoping 2.2.1 would be faster at reading the logs by virtue of xreadlines and general tweaking over time.
>
> But to my surprise and mild horror 1.5.2 was clocking horrendously faster than 2.2.1 ....
>
> Whats the deal ? It may be something basic Im overlooking but I cant see why 2.2.1 should not at least be the same speed as 1.5.2.
>
> My basic timing code is as follows :
>
>
> FOR 1.5.2 :
> -------------------------------------------
> import gzip, time
> list = []
> file = gzip.open('20010615.gz')
> start = time.clock()
> for line in file.readlines():
>         list.append(line)
> stop = time.clock()
> total = stop - start
> print
> print '1.5.2 Running Time : '+str(total)
> print
> -------------------------------------------
>
>
> FOR 2.2.1 :
> -------------------------------------------
> import gzip, time, xreadlines
> list = []
> file = gzip.open('20010615.gz')
> start = time.clock()
> for line in xreadlines.xreadlines(file):
>         list.append(line)
> stop = time.clock()
> total = stop-start
> print
> print 'Total running time 2.2.1a : '+str(total)
> print
> -------------------------------------------
>
> The time come out at :
> 1.5.2 -> 3 seconds or so
> 2.2.1 -> 13 seconds or so
>
> Help ?
>
> Dave

It's quite simple. file.readlines() generates a true list in one go, whereas
xreadlines.xreadlines() creates a generator that has to be called each time
you want a line. Generators will never be faster than data.

If you test range() versus xrange() for very large values you'll find the same
thing, even in the same version of Python.

-- 
Ignacio Vazquez-Abrams  <ignacio at openservices.net>