[Python-Dev] Dictionary tuning

Tue, 29 Apr 2003 09:08:13 +0200

Guido van Rossum wrote:
>>I've experimented with about a dozen ways to improve dictionary 
>>performance and found one that benefits some programs by up to 
>>5% without hurting the performance of other programs by more
>>than a single percentage point.
>>
>>It entails a one line change to dictobject.c resulting in a new 
>>schedule of dictionary sizes for a given number of entries:

Perhaps you could share that change ? Or is it on SF somewhere ?

>>Number of           Current size        Proposed size
>>Filled Entries      of dictionary       of dictionary
>>--------------      -------------       -------------
>>[-- 0 to 5 --]            8                   8
>>[-- 6 to 10 --]          16                  32
>>[-- 11 to 21 --]         32                  32
>>[-- 22 to 42 --]         64                 128
>>[-- 43 to 85 --]        128                 128
>>[-- 86 to 170 --]       256                 512
>>[-- 171 to 341 --]      512                 512
> 
> 
> I suppose there's an "and so on" here, right?  I wonder if for
> *really* large dicts the space sacrifice isn't worth the time saved?

Once upon a time, when I was playing with inlining dictionary
tables (now part of the dictionary implementation thanks to Tim),
I found that optimizing dictionaries to have a table size 8
gave the best results. Most dictionaries in a Python application
have very few entries (and most of them were instance dictionaries
at the time -- not sure whether that's changed).

Another result of my experiments was that reducing the number
of resizes made a big difference.

To get some more useful numbers, I suggest to instrument Python
to display the table size of dictionaries and the number of
resizes necessary to make them that big. You should also keep
a good eye on the overall process size.

I believe that the reason for the speedups you see is
that cache sizes and processor optimizations have changes
since the times the current resizing implementation was chosen,
so maybe we ought to rethink the parameters:

* minimum table size
* first three resize steps

I don't think that large dictionaries should become more
sparse -- that's just a waste of memory.

>>The idea is to lower the average sparseness of dictionaries (by
>>0% to 50% of their current sparsenes).  This results in fewer 
>>collisions, faster collision resolution, fewer memory accesses,
>>and better cache performance.  A small side-benefit is halving
>>the number of resize operations as the dictionary grows.
> 
> I think you mean "raise the average sparseness" don't you?
> (The more sparse something is, the more gaps it has.)
> 
> I tried the patch with my new favorite benchmark, startup time for
> Zope (which surely populates a lot of dicts :-).  It did give about
> 0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
> speedup.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 29 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        56 days left