built in zip function speed

Tue Jul 4 11:29:42 EDT 2006

Steven D'Aprano wrote:
> On Tue, 04 Jul 2006 07:18:29 -0700, mart.franklin at gmail.com wrote:
>
> > I hope I am not being too ignorant :p but here goes... my boss has
> > written a bit of python code and asked me to speed it up for him...
> > I've reduced the run time from around 20 minutes to 13 (not bad I think
> > ;) to speed it up further I asked him to replace a loop like this:-
> >
> >
> > index = 0
> >
> > for element in a:
> >    av = a[index]
> >    bv = b[index]
> >    cv = c[index]
> >    dv = d[index]
> >    avbv = (av-bv) * (av-bv)
> >    diff = cv - dv
> >    e.append(diff - avbv)
> >    index = index + 1
>
> This is, I think, a good case for an old-fashioned for-with-index loop:
>
> for i in len(a):
>     e.append(c[i] - d[i] - (a[i] - b[i])**2)
>
> Python doesn't optimize away lines of code -- you have to do it yourself.
> Every line of Python code takes a bit of time to execute. My version uses
> 34 lines disassembled; yours takes 60 lines, almost twice as much code.
>
> (See the dis module for further details.)
>
> It's too much to hope that my code will be twice as fast as yours, but it
> should be a little faster.

indeed thanks very much :)

my tests on 4 million:-

slice (original):
7.73399996758

built in zip:
36.7350001335

izip:
5.98399996758

Steven slice:
4.96899986267

so overall fastest so far

>
> > (where a, b, c and d are 200,000 element float arrays)
> > to use the built in zip function.. it would seem made for this problem!
> >
> > for av, bv, cv, dv in zip(a, b, c, d):
> >    avbv = (av-bv) * (av - bv)
> >    diff = cv - dv
> >    e.append(diff - avbv)
> >
> > however this seems to run much slower than *I* thought it would
> > (and in fact slower than slicing) I guess what I am asking is.. would
> > you expect this?
>
> Yes. zip() makes a copy of your data. It's going to take some time to copy
> 4 * 200,000 floats into one rather large list. That list is an ordinary
> Python list of objects, not an array of bytes like the array module
> uses. That means zip has to convert every one of those 800,000 floats
> into rich Python float objects. This won't matter for small sets of data,
> but with 800,000 of them, it all adds up.
>
>

I was beginning to suspect this was the case (I opened windows task
manager and noticed the memory usage) thanks for explaining it to me.

> -- 
> Steven.