something Evil happens when large hashes destroyed

guuge guuge at localhost.localhost
Mon Nov 19 10:23:17 EST 2001


On Sun, 18 Nov 2001 17:37:03 GMT, Hans Nowak <wurmy at earthlink.net> wrote:
> guuge wrote:
>> 
>> I was trying to sort about 100,000 items by splitting them into
>> groups (using a python dictionary type) and then successively splitting
>> these groups until they were small enough to use a brute force method.
>> 
>> My test program, which used only two or three keys to create the first
>> split, worked fine.  When I tried the real thing, using 256 keys, the
>> program slowed to a crawl.  The python interpreter took forever to
>> destroy the dictionaries.
>> 
>> Here's a little program that will place X number of elements into
>> Y number of bins and gives the time used to do the placement and
>> the time used to destroy the dictionary.
>> 
> [snip snip]
> 
> Interesting... on my box, a P3/900 with 128 Mb, the dicts are destroyed
> faster when I use more keys. Low numbers of keys result in long
> destruction times.
> 
> For 100,000 elements, I always get a creation time of 2 seconds, and
> the following destruction times:
> 20 keys: 61 seconds
> 34 keys: 24 seconds
> 60 keys: 8 seconds
> 256 keys: 3 seconds
> 
> Of course, this could have been timed better, by using a larger set than
> 100,000. Maybe some other time...
> 
> It seems, the larger the dicts, the more time it takes to destroy it.
> In other words, if N elements are distributed over a lot of small
> dict, the total destroy time is a lot lower than when they are
> distributed over a few large dicts. Maybe this makes sense for people
> who know the C implementation of the dict... (I don't. :-)
> 
> Anyway, my guess is that if you use a lot of keys, the total destroy
> time should be lower... correct?

I suppose more keys would make destroy time take a bit longer, but it 
would seem natural that an increase in destroy time would be gradual.

The wierd thing is that as you slowly increase keys (1 -> 2 -> etc..)
at some point the destroy time makes a huge jump from a few seconds
to a few minutes.

btw, compiling the python interpreter with pymalloc solved the problem.
I guess it's a linux libc issue.




More information about the Python-list mailing list