zip() function troubles

Peter Otten __peter__ at web.de
Fri Jul 27 01:24:36 EDT 2007


Istvan Albert wrote:

> I've been debugging the reason for a major slowdown in a piece of
> code ... and it turns out that it was the zip function. In the past
> the lists that were zipped were reasonably short, but once the size
> exceeded 10 million the zip function slowed to a crawl. Note that
> there was memory available to store over 100 million items.
> 
> Now I know that zip () wastes lots of memory because it copies the
> content of the lists, I had used zip to try to trade memory for speed
> (heh!) , and now that everything was replaced with izip it works just
> fine.  What was really surprising is that it works with no issues up
> until 1 million items, but for say 10 million it pretty much goes
> nuts. Does anyone know why? is there some limit that it reaches, or is
> there something about the operating system (Vista in the case)  that
> makes it behave like so?
> 
> I've noticed the same kinds of behavior when trying to create very
> long lists that should easily fit into memory, yet above a given
> threshold I get inexplicable slowdowns. Now that I think about is this
> something about the way lists grow when expanding them?
> 
> and here is the code:
> 
> from itertools import izip
> 
> BIGNUM = int(1E7)
> 
> # let's make a large list
> data = range(BIGNUM)
> 
> # this works fine (uses about 200 MB and 4 seconds)
> s = 0
> for x in data:
>     s += x
> print s
> 
> 
> # this works fine, 4 seconds as well
> s = 0
> for x1, x2 in izip(data, data):
>     s += x1
> print s
> 
> 
> # this takes over 2 minutes! and uses 600 MB of memory
> # the memory usage slowly ticks upwards
> s = 0
> for x1, x2 in zip(data, data):
>     s += x1
> print s

When you are allocating a lot of objects without releasing them the garbage
collector kicks in to look for cycles. Try switching it off:

import gc
gc.disable()
try:
    # do the zipping
finally:
    gc.enable()

Peter



More information about the Python-list mailing list