[Python-Dev] Guarantee ordered dict literals in v3.7?

Nick Coghlan ncoghlan at gmail.com
Mon Nov 6 07:09:18 EST 2017


On 6 November 2017 at 21:18, Steve Holden <steve at holdenweb.com> wrote:
> I have to agree: I find the elevation of a CPython implementation detail to
> a language feature somewhat hard to comprehend. Maybe it's more to do with
> the way it's been presented, but this is hardly an enhancement the language
> has been screaming for for years.
>
> Presumably there is little concern that algorithms that rely on this
> behaviour will be perfectly syntactically conformant with earlier versions
> but will fail subtly and without explanation? It's a small concern, but a
> real one - particularly for learners.

A similar concern existed when we elevated sort stability to being a
language requirement - if you relied on that guarantee, your code was
technically buggy on versions prior to 2.3, but eventually 2.2 and
earlier aged out of general use, allowing such code to become correct
in general.

So the current discussion is mainly about deciding where we want the
compatibility burden to fall in relation to dict insertion ordering:

1. Do we deliberately revert CPython back to being harder to use
correctly for the sake of making Python easier to implement?
2. Do we make Python harder to implement for the sake of making it
easier to use?
3. Do we choose not to choose, thus implicitly choosing "2" by default
due to the fact that Python is defined by a language spec and a
reference implementation, rather than *just* a language spec?

Here's a more-complicated-than-a-doctest-for-a-dict-repo, but still
fairly straightforward, example regarding the "insertion ordering
dictionaries are easier to use correctly" argument:

    import json
    data = {"a":1, "b":2, "c":3}
    rendered = json.dumps(data)
    data2 = json.loads(rendered)
    rendered2 = json.dumps(data2)
    # JSON round trip
    assert data == data2, "JSON round trip failed"
    # Dict round trip
    assert rendered == rendered2, "dict round trip failed"

Both of those assertions will always pass in CPython 3.6, as well as
in PyPy, because their dict implementations are insertion ordered,
which means the iteration order on the dictionaries is always "a",
"b", "c".

If you try it on 3.5 though, you should fairly consistently see that
last assertion fail, since there's nothing in 3.5 that ensures that
data and data2 will iterate over their keys in the same order.

You can make that code implementation independent (and sufficiently
version dependent to pass both assertions) by using OrderedDict:

    from collections import OrderedDict
    import json
    data = OrderedDict(a=1, b=2, c=3)
    rendered = json.dumps(data)
    data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
    rendered2 = json.dumps(data2)
    # JSON round trip
    assert data == data2, "JSON round trip failed"
    # Dict round trip
    assert rendered == rendered2, "dict round trip failed"

However, despite the way this code looks, the serialised key order
*might not* be "a, b, c" on 3.5 and earlier (it will be on 3.6+, since
that already requires that kwarg order be preserved).

So the formally correct version independent code that reliably ensures
that the key order in the JSON file is always "a, b, c" looks like
this:

    from collections import OrderedDict
    import json
    data = OrderedDict((("a",1), ("b",2), ("c",3)))
    rendered = json.dumps(data)
    data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
    rendered2 = json.dumps(data2)
    # JSON round trip
    assert data == data2, "JSON round trip failed"
    # Dict round trip
    assert rendered == rendered2, "dict round trip failed"
    # Key order
    assert "".join(data) == "".join(data2) == "abc", "key order failed"

Getting from the "Works on CPython 3.6+ but is technically
non-portable" state to a fully portable correct implementation that
ensures a particular key order in the JSON file thus currently
requires the following changes:

- don't use a dict display, use collections.OrderedDict
- make sure to set object_pairs_hook when using json.loads
- don't use kwargs to OrderedDict, use a sequence of 2-tuples

For 3.6, we've already said that we want the last constraint to age
out, such that the middle version of the code also ensures a
particular key order.

The proposal is that in 3.7 we retroactively declare that the first,
most obvious, version of this code should in fact reliably pass all
three assertions.

Failing that, the proposal is that we instead change the dict
iteration implementation such that the dict round trip will start
failing reasonably consistently again (the same as it did in 3.5), so
that folks realise almost immediately that they still need
collections.OrderedDict instead of the builtin dict.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list