[Python-Dev] Guarantee ordered dict literals in v3.7?

Raymond Hettinger raymond.hettinger at gmail.com
Sun Nov 5 18:43:32 EST 2017


> On Nov 4, 2017, at 7:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> When I asked Damien George about this for MicroPython, he indicated
> that they'd have to choose between guaranteed order and O(1) lookups
> given their current dict implementation. That surprised me a bit
> (since PyPy and CPython both *saved* memory by switching to their
> guaranteed order implementations, hence the name "compact dict
> representation"), but my (admittedly vague) understand is that the
> presence of a space/speed trade-off in their case has something to do
> with MicroPython deliberately running with a much higher chance of
> hash collisions in general (since the data sets it deals with are
> naturally smaller).
> 
> So if we make the change, MicroPython will likely go along with it,
> but it may mean that dict lookups there become O(N), and folks will be
> relying on "N" being consistently small due to memory constraints (but
> some typically O(N) algorithms will still become O(N^2) when run on
> MicroPython).
> 
> I don't think that situation should change the decision, but I do
> think it would be helpful if folks that understand CPython's dict
> implementation could take a look at MicroPython's dict implementation
> and see if it might be possible for them to avoid having to make that
> trade-off and instead be able to use a naturally insertion ordered
> hashmap implementation.

I've just looked at the MicroPython dictionary implementation and think they won't have a problem implementing O(1) compact dicts with ordering.

The likely reason for the confusion is that they are already have an option for an "ordered array" dict variant that does a brute-force linear search.  However, their normal hashed lookup is very similar to ours and is easily amenable to being compact and ordered.

See:  https://github.com/micropython/micropython/blob/77a48e8cd493c0b0e0ca2d2ad58a110a23c6a232/py/map.c#L139

Pretty much any implementation hashed lookup of keys and values is amenable to being compact and ordered.  Whatever existing logic that looks up an entry becomes a lookup into a table of indices which in turn references a sequential array of keys and values.  This logic is independent of hashing scheme or density, and it has no effect on the number of probes or collision rate.

The cost is an extra level of indirection and an extra array of indices (typically very small). The benefit is faster iteration over the smaller dense key/value array, overall memory savings resulting in improved cache utilization, and the side-effect of remembering insertion order.

Summary:  I think MicroPython will be just fine and if needed I will help create the patch that implements compact-and-ordered behavior.


Raymond






More information about the Python-Dev mailing list