[Python-Dev] iterzip()

Mon, 29 Apr 2002 13:43:42 -0400

[Guido]
> Probably the list-append still has a slight quadratic component --
> you're doing a million elements here.

I get:

  justpush   1.39
   justzip   7.87

from this:

"""
from time import clock as now

n = 1000000
zeroes = [0] * n

def justpush():
    x = []
    push = x.append
    for i in zeroes: push(i)

def justzip():
    zip(zeroes)

def run(func):
    start = now()
    func()
    finish = now()
    print "%10s %6.2f" % (func.__name__, finish - start)

run(justpush)
run(justzip)
"""

That is, at first glance it's *much* faster to do a million appends in pure
Python than it is to let zip do them at C speed.  I'm curious about what
people get for this program on other platforms (pymalloc on or off may make
a dramatic difference too -- or not, depending on how the platform malloc
works).

The cause on Win98SE is clear enough with a little instrumentation:  in the
justzip case, listobject.c's ins1 ends up *copying* the memory much more
often, presumably due to that there's a new tuple allocation after every
append (preventing realloc from growing into that space).  justpush copies
the memory at sizes (skipping some tiny ones, and where this is a count of
the number of elements in the list, not the number of bytes (multiply by 4
to get the latter)):

    247 639 959 2047 45055 163839

and that's all.  justzip copies the memory at sizes:

    247 639 959 2047 45055 53247 57343 65535 81919 98303 122879 163839
    196607 229375 262143 294911 360447 393215 458751 589823 622591
    688127 720895 753663 786431 819199 851967 884735 917503 950271

Copying 3-4MB multiple times at the end ain't cheap.

justpush does relatively worse if the order in which they're called is
swapped, as then justzip and the custom tuple free list conspire to leave
memory in a more-fragmented state.

> But I asked something else.  How much does the speed difference affect
> the apps you have?  In my experience it's usually used for small lists
> where the quadratic effect doesn't occur and the timing doesn't matter
> compared to the rest of the program.

Same here.  High volume "vectorized" operations are NumPy's proper domain.