[Python-Dev] Compact ordered dict is not ordered for split table. (was: PEP XXX: Compact ordered dict

Thu Jun 23 21:00:18 EDT 2016

On Tue, Jun 21, 2016 at 8:40 PM, INADA Naoki <songofacandy at gmail.com> wrote:
> There are three options I can think.
>
>
> 1) Revert key-shared dict (PEP412).
>
> pros: Removing key-shared dict makes dict implementation simple.
>
> cons: In some applications, PEP 412 is far more compact than compact
> ordered dict.  (Note: Using __slots__ may help such situation).
>
>
> 2) Don't make "keeping insertion order" is Python Language Spec.
>
> pros: Best efficiency
>
> cons: Different behavior between normal dict and instance.__dict__ may
> confuse people.
>
>
> 3) More strict rule for key sharing dict.
>
> My idea is:
> * Increasing number of entries (inserting new key) can be possible
> only if refcnt of keys == 1.
>
> * Inserting new item (with existing key) into dict is allowed only when
>   insertion position == number of items in the dict (PyDictObject.ma_used).
>
> pros: We can have "dict keeping insertion order".
>
> cons: Can't use key-sharing dict for many cases.  Small and harmless
> change may cause
> sudden memory usage increase.  (__slots__ is more predicable).

IIUC, key-sharing dicts are a best-effort optimization where if I have
a class like:

class Foo:
    def __init__(self, a, b):
        self.a = a
        self.b = b

f1 = Foo(1, 2)
f2 = Foo(3, 4)

then f1.__dict__ and f2.__dict__ can share their key arrays... but if
I do f1.c = "c", then f1.__dict__ gets automatically switched to a
regular dict. The idea being that in, say, 99% of cases, different
objects of the same type all share the same set of keys, and in the
other 1%, oh well, we fall back on the regular behavior.

It seems to me that all this works fine for ordered dicts too, if we
add the restriction that key arrays can be shared if and only if the
two dicts have the same set of keys *and* initially assign those keys
in the same order. In, say, 98.9% of cases, different objects of the
same type all share the same set of keys and initially assign those
keys in the same order, and in the other 1.1%, oh well, we can
silently fall back on unshared keys, same as before. (And crucially,
the OrderedDict semantics are that only adding *new* keys changes the
order; assignments to existing keys preserve the existing order. So if
a given type always creates the same instance attributes in the same
order at startup and never adds or deletes any, then its key values
*and* key order will stay the same even if it later mutates some of
those existing attributes in-place.)

It's possible that there will be some weird types that mess this up, like:

class WeirdFoo:
    def __init__(self, a, b):
        if a % 2 == 0:
            self.a = a
            self.b = b
        else:
            self.b = b
            self.a = a

assert list(WeirdFoo(1, 2).__dict__.keys()) != list(WeirdFoo(2,
3).__dict__.keys())

but, who cares? It'd be good due-diligence to collect data on this to
confirm that it isn't a big issue, but intuitively, code like
WeirdFoo.__init__ is vanishingly rare, and this is just a best-effort
optimization anyway. Catching 98.9% of cases is good enough.

Is there something I'm missing here? Is this your option #3?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org