[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

Raymond Hettinger raymond.hettinger at gmail.com
Thu Sep 15 05:43:46 EDT 2016


> On Sep 14, 2016, at 11:31 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> 
> Note that this is made at the expense of the 20% slowing down an iteration.
> 
> $ ./python -m timeit -s "d = dict.fromkeys(range(10**6))" -- "list(d)"
> Python 3.5: 66.1 msec per loop
> Python 3.6: 82.5 msec per loop

A range of consecutive integers which have consecutive hash values is a really weak and non-representative basis for comparison.

Something like this will reveal the true and massive improvement in iteration speed:

     $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)"

There are two reasons for the significant improvement in iteration speed:

1) The dense key table is smaller (no intervening NULL entries) so we do fewer total memory fetches to loop over the keys, values, or items.

2) The loop over the dense table no longer makes frequent, unpredictable tests for NULL entries.  (To better understand why this matters and how major the impact is, see http://stackoverflow.com/questions/11227809 ).

Your mileage will vary depending on the size of dictionary and whether the old dictionary would have densely packed the keys (as in Serhiy's non-representative example).


Raymond


P.S.  Algorithmically, the compact dict seems to be mostly where it needs to be (modulo some implementation bugs that are being ironed-out).  However, the code hasn't been tuned and polished as much as the old implementation, so there is still room for its timings to improve.  Dict copies should end-up being faster (fewer bytes copied and a predictable test for NULLs).  Resizes should be much faster (only the small index table needs to be updated, while the keys/values/hashes don't get moved).  In complex apps, the memory savings ought translate into better cache performance (that doesn't show-up much in tight benchmark loops but tends to make a different in real code).


More information about the Python-Dev mailing list