zip() function troubles

Istvan Albert istvan.albert at gmail.com
Thu Jul 26 19:25:36 EDT 2007


Hello all,

I've been debugging the reason for a major slowdown in a piece of
code ... and it turns out that it was the zip function. In the past
the lists that were zipped were reasonably short, but once the size
exceeded 10 million the zip function slowed to a crawl. Note that
there was memory available to store over 100 million items.

Now I know that zip () wastes lots of memory because it copies the
content of the lists, I had used zip to try to trade memory for speed
(heh!) , and now that everything was replaced with izip it works just
fine.  What was really surprising is that it works with no issues up
until 1 million items, but for say 10 million it pretty much goes
nuts. Does anyone know why? is there some limit that it reaches, or is
there something about the operating system (Vista in the case)  that
makes it behave like so?

I've noticed the same kinds of behavior when trying to create very
long lists that should easily fit into memory, yet above a given
threshold I get inexplicable slowdowns. Now that I think about is this
something about the way lists grow when expanding them?

and here is the code:

from itertools import izip

BIGNUM = int(1E7)

# let's make a large list
data = range(BIGNUM)

# this works fine (uses about 200 MB and 4 seconds)
s = 0
for x in data:
    s += x
print s


# this works fine, 4 seconds as well
s = 0
for x1, x2 in izip(data, data):
    s += x1
print s


# this takes over 2 minutes! and uses 600 MB of memory
# the memory usage slowly ticks upwards
s = 0
for x1, x2 in zip(data, data):
    s += x1
print s




More information about the Python-list mailing list