[Python-Dev] Dictionary tuning

Raymond Hettinger python@rcn.com
Mon, 28 Apr 2003 23:15:50 -0400


[Tim Peters] 
> Dicts can be larger after the 
> patch, but never
> smaller, so there's nothing opposing the "can be larger" 
> part:  on average, allocated address space must be strictly larger than before.

I think of the resize intervals as steps on a staircase.
My patch eliminates the even numbered stairs.
The average logarithmic slope of the staircase doesn't change,
there are just fewer discrete steps.  Also, the height of 
the staircase doesn't change unless the top stair was even, 
in which case, another half step is added.


[Tim Peters]
> Resizing a large dict is an expensive operation too.
Not only are there fewer resizes, but the cost of the operation
becomes cheaper because it takes less time to load a sparse
dictionary than one that is more dense.


[Tim Peters] 
> Whether that *matters* on average to the average user is something 
> we can answer
> rigorously just as soon as we find an average user with an 
> average program
> <wink>.  I'm not inclined to worry much about it.

Me either, I suspect that it is rare to find a stable application that is
functioning just fine and consuming nearly all memory.  Sooner
or later, some change in data, hardware, os, or script would push
it over the edge.


[Timothy Delaney]
> That's what I was getting at. I know that (for example) most
> classes I create have less that 16 entries in their __dict__.
> With this change, each class instance would take (approx) twice
> as much memory for its __dict__. I suspect that class instance
> __dict__ is the most common dictionary I use.

Those dicts would also be the ones benefitting from the patch.
Their density would be halved; resulting in fewer collisions,
improved search times, and better cache performance.


[Timothy Delaney]
> Perhaps we need to add some internal profiling, so that
> "quickly-growing" dictionaries get larger reallocations ;)

I came up with this patch a couple of months ago and have
since tried every tweak I could think of (apply to this size
dict but not that one, etc) but found nothing that survived
a battery of application benchmarks.

Have you guys tried out the patch? I'm very interested in 
getting results from different benchmarks, processors, 
cache sizes, and various operating systems.

sparse-is-better-than-dense-ly yours,


Raymond
(currently, the only one.  unlike two Tims, two Bretts, 
 two Jacks and a Fredrik distinct from Fred)












#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################