[Python-Dev] More compact dictionaries with faster iteration

Wed Dec 12 23:06:18 CET 2012

On 12/12/2012 10:31 PM, PJ Eby wrote:
> On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 12/12/2012 01:15 AM, Nick Coghlan wrote:
>>>
>>> On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <dinov at microsoft.com
>>> <mailto:dinov at microsoft.com>> wrote:
>>>
>>>      OTOH changing certain dictionaries in IronPython (such as keyword
>>>      args) to be
>>>      ordered would certainly be possible.  Personally I just wouldn't
>>>      want to see it
>>>      be the default as that seems like unnecessary overhead when the
>>>      specialized
>>>      class exists.
>>>
>>>
>>> Which reminds me, I was going to note that one of the main gains with
>>> ordered keyword arguments, is their use in the construction of
>>> string-keyed objects where you want to be able to control the order of
>>> iteration (e.g. for serialisation or display purposes). Currently you
>>> have to go the path of something like namedtuple where you define the
>>> order of iteration in one operation, and set the values in another.
>>
>>
>> So here's a brand new argument against ordered dicts: The existence of
>> perfect hashing schemes. They fundamentally conflict with ordered dicts.
>
> If I understand your explanation, then they don't conflict with the
> type of ordering described in this thread.  Raymond's optimization
> separates the "hash table" part from the "contents" part of a
> dictionary, and there is no requirement that these two parts be in the
> same size or the same order.

I don't fully agree.

Perfect hashing already separates "hash table" from "contents" (sort 
of), and saves the memory in much the same way.

Yes, you can repeat the trick and have 2 levels of indirection, but that 
then requires an additional table of small ints which is pure overhead 
present for the sorting; in short, it's no longer an optimization but 
just overhead for the sortability.

Dag Sverre

>
> Indeed, Raymond's split design lets you re-parameterize the hashing
> all you want, without perturbing the iteration order at all.  You
> would in fact be able to take a dictionary at any moment, and say,
> "optimize the 'hash table' part to a non-colliding state based on the
> current contents", without touching the 'contents' part at all.
>
> (One could do this at class creation time on a class dictionary, and
> just after importing on a module dictionary, for example.)
>