[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered

Raymond Hettinger raymond.hettinger at gmail.com
Thu Sep 15 01:42:02 EDT 2016


> On Sep 14, 2016, at 3:50 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> 
>> 
>> Then, I'll do same to sets.
> 
> Unless I've misunderstood, Raymond was opposed to making a similar
> change to set.

That's right.  Here are a few thoughts on the subject before people starting running wild.

* For the compact dict, the space savings was a net win with the additional space consumed by the indices and the overallocation for the key/value/hash arrays being more than offset by the improved density of key/value/hash arrays.   However for sets, the net was much less favorable because we still need the indices and overallocation but can only offset the space cost by densifying only two of the three arrays.  In other words, compacting makes more sense when you have wasted space for keys, values, and hashes.  If you lose one of those three, it stops being compelling.

* The use pattern for sets is different from dicts.  The former has more hit or miss lookups.  The latter tends to have fewer missing key lookups. Also, some of the optimizations for the set-to-set operations make it difficult to retain set ordering without impacting performance.

* I pursued alternative path to improve set performance.  Instead of compacting (which wasn't much of space win and incurred the cost of an additional indirection), I added linear probing to reduce the cost of collisions and improve cache performance.  This improvement is incompatible with the compacting approach I advocated for dictionaries.

* For now, the ordering side-effect on dictionaries is non-guaranteed, so it is premature to start insisting the sets become ordered as well.  The docs already link to a recipe for creating an OrderedSet ( https://code.activestate.com/recipes/576694/ ) but it seems like the uptake has been nearly zero.  Also, now that Eric Snow has given us a fast OrderedDict, it is easier than ever to 
 build an OrderedSet from MutableSet and OrderedDict, but again I haven't observed any real interest because typical set-to-set data analytics don't really need or care about ordering.  Likewise, the primary use of fast membership testings is order agnostic.

* That said, I do think there is room to add alternative set implementations to PyPI.  In particular, there are some interesting special cases for orderable data where set-to-set operations can be sped-up by comparing entire ranges of keys (see https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists for a starting point).  IIRC, PyPI already has code for set-like bloom filters and cuckoo hashing.

* I understanding that it is exciting to have a major block of code accepted into the Python core but that shouldn't open to floodgates to engaging in more major rewrites of other datatypes unless we're sure that it is warranted.


Raymond Hettinger





More information about the Python-Dev mailing list