[Python-Dev] PyBench DictCreation (was Re: Performance compares)

Tim Peters tim@digicool.com
Thu, 17 May 2001 18:55:19 -0400


The worst percentage hit in both MAL's and Jeremy's pybench run was (here
showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's <wink>):

        DictCreation:      87.80 ms    2.93 us  +115.72%

Assorted things do not account for it:  the new overhead of linking and
unlinking dicts into the gc list (at creation and destruction times) seems
to account for no more than 2%; and the overhead due to using the slower
lookdict (as opposed to lookdict_string) even less.

Jeremy cheated by running a profiler:  the true cause is that dictresize
gets called about twice as often.

Before 2.1:  *before* inserting an item, we checked to see whether the dict
was at the resize point.  If so, we resized it.  Note that this meant
PyDict_SetItem could grow a dict even if no new entry was made (and that
this was the cause of several excruciating bugs in the 2.1 release cycle,
since it meant a dict could get reshuffled merely when replacing the values
associated with existing keys).

2.1:  *after* inserting an item, and if the key was new (i.e., the dict grew
a new entry, as opposed to just replacing the value associated with an
existing key), and the dict is at the resize point, we resize it.

Now the DictCreation test overwhelmingly creates dicts of size exactly 3.
The dict resizes from empty to capacity 4 on the way to gaining 2 entries.
When adding the third:

Before 2.1:  2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up
remaining a capacity-4 dict with 3 slots full.  This actually violates a
documented dict invariant (i.e., that dicts are never more than 2/3rd full).

2.1:  The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we
*do* resize it, and the dict ends up with 3 of 8 slots full.

I've got no interest in trying to restore the old behavior.  A compromise
may be to boost the minimum size of a non-empty dict from 4 to 8.  As is,
the only non-empty dicts that can get away with using the current minimum
size of 4 have no more than 2 elements.  The question is whether such tiny
non-empty dicts are common enough to make everyone else pay for "an extra"
resize.

go-ahead-just-*try*-to-prove-your-answer<wink>-ly y'rs  - tim