Program very slow to finish

Mon Nov 5 16:38:46 EST 2001

Hi Tim,

Thanks for your comprehensive answer. I hope you'll bear with me, for
I'm
looking for some clarification

Tim Peters wrote:
> 
> [Roeland Rengelink]
> > I'm pretty sure this is a problem with the garbage collector, I did some
> > measurements showing bad (O(n**2)) performance when collecting large
> > numbers of objects at once (as happens when your dicts are deleted at
> > the end of your program). I filed a bug report (#477963) on it. See:
> >
> >
> <http://sourceforge.net/tracker/index.php?func=detail&aid=477963&group_id=54
> 70&atid=105470>
> >
> > for more details
> 
> I expect you're just measuring your platform's cache performance.  Here's
> the output from running on a Win98SE box w/ 256MB RAM (I'll leave out the
> creation times because they aren't interesting; note too that time.time() on
> Windows has poor precision, so the numbers are mostly nonsense for the
> smallest dicts):
> 
> size:   10000, destruction:  0.00 (usec/elem)
> size:   20000, destruction:  2.50 (usec/elem)
> size:   50000, destruction:  1.20 (usec/elem)
> size:  100000, destruction:  1.10 (usec/elem)
> size:  200000, destruction:  1.10 (usec/elem)
> size:  400000, destruction:  1.25 (usec/elem)
> size:  600000, destruction:  1.37 (usec/elem)
> size:  800000, destruction:  1.45 (usec/elem)
> size: 1000000, destruction:  1.54 (usec/elem)
> 

Here are my numbers again, just for comparison (and I do think you're
showing an increase in destruction time too, just not as bad) 

The algo:

-  create dictionary d with N 7-char keys
-  execute d=None

All times in micro-seconds per item. For the code see end of this post.
(Linux 2.2.14, 128M RAM, Cel 333 MHz)

size:   10000, creation: 31.00, destruction:  1.49
size:   20000, creation: 31.10, destruction:  1.57
size:   50000, creation: 32.77, destruction:  1.76
size:  100000, creation: 32.00, destruction:  1.92
size:  200000, creation: 32.59, destruction:  2.38
size:  500000, creation: 32.12, destruction:  4.35
size: 1000000, creation: 32.25, destruction: 10.47

If it's cache misses, I'd expect:

T = t_f + p*t_m

where:

T   -- destruction time per element, 
t_f -- fixed cost of freeing an element
t_m -- additional cost in case of cache miss
p   -- probability of cache miss

and

p = (1-m/M)

with m some fraction of the cache size, and M proportional to the
problem size. Hence, I'd expect the destruction time per element to go
asymptotically to some value T=t_f+t_m. Also, for the same reason that
you'd expect cache misses on destruction (elements are not in the same
order in the dict as order in which they are created) wouldn't you
expect a proportional number of cache misses upon insertion in the
dictionary? I don't see evidence for that in the creation times.

Now, this may be a very naive argument. But, hey, I'm an astronomer, not
a computer scientist

> So buy more RAM <0.5 wink>.
> 

Well, I did state in my bug report that everything fitted nicely in main
memory. For comparison here are the numbers when this test _does_
require virtual memory:

size: 1500000, creation: 254.03, destruction: 31.04 

I promise that if I ever run into this in a real-world situation I'll
buy more RAM ;)

> This is the code that deallocates dicts:
> 
>         for (ep = mp->ma_table; fill > 0; ep++) {
>                 if (ep->me_key) {
>                         --fill;
>                         Py_DECREF(ep->me_key);
>                         Py_XDECREF(ep->me_value);
>                 }
>         }
> 
> That is, it's a single scan over the key+value pairs, decrementing the
> refcounts on each.  Note that while a left-to-right scan of the container is
> cache-friendly, dicts work hard to "randomize" the offsets at which keys are
> stored, so the memory pointed to *by* ep->me_key (containing your string
> objects) has no useful relation to the order in which you created your
> string objects:  you're going to get mountains of cache misses during
> deallocation.
> 

> It's a mystery why deallocation takes so much longer than allocation when
> things go bad (look at it:  the deallocation code is dead simple).  I
> tracked down one of those long ago in hideous detail, and it turned out to
> be due to the platform free() trashing like mad trying to coalesce adjacent
> free'd areas.

You've convinced/reassured me that it can't be Python. I'm not convinced
it's the cache. That basically leaves free(). I'm curious if anybody
else has seen this behaviour. I hope some more people try this

Again, thanks for your answer,

Roeland
-- 
r.b.rigilink at chello.nl

"Half of what I say is nonsense. Unfortunately I don't know which half"
--
import time

def make_dict(n):
    d = {}
    for i in xrange(n):
        s = '%7.7i' % i
        d[s] = 1
    return d

def time_it(n):
    t1 = time.time()
    d = make_dict(n)
    t2 = time.time()
    d = None
    t3 = time.time()
    return t1, t2, t3

if __name__ == '__main__':
    s = "size: %7i, creation: %5.2f, destruction: %5.2f"
    for i in [1, 2, 5, 10, 20, 50, 100]:
        n = i*10000
        t1, t2, t3 = time_it(n)
        print s % (n, (t2-t1)/n*1e6, (t3-t2)/n*1e6)