[Python-ideas] Deprecating the old-style sequence protocol

Nick Coghlan ncoghlan at gmail.com
Sun Dec 27 01:22:37 EST 2015


On 27 December 2015 at 13:07, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
>
> I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See
> http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)
>
> Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.
>
> But obviously, deprecating the old-style sequence protocol would make the problems go away.

[snip]

> Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:
>
>> ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).
>
>> ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).
>
>> If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().

There's an additional option we can consider, which is to move the
backwards compatibility fallback to type creation time, rather than
method lookup time. The two rules would be:

* if a type defines __getitem__ without also defining __iter__, add a
default __iter__ implementation that assumes the type is a sequence
* if a type defines __getitem__ and __len__ without also defining
__reversed__, add a default __reversed__ implementation that assumes
the type is a sequence

(At the C level, even sequences need to use the mapping slots to
support extended slicing, so we can't make the distinction based on
which C level slots are defined)

As with using "__hash__ = None" to block the default inheritance of
object.__hash__, setting "__iter__ = None" or "__reversed__ = None" in
a class definition would block the addition of the implied methods.

However, while I think those changes would clean up some quirky edge
cases without causing any harm, even doing all of that still wouldn't
get us to the point of having a truly *structural* definition of the
difference between a Mapping and a Sequence. For example, OrderedDict
defines all of __len__, __getitem__, __iter__ and __reversed__
*without* being a sequence in the "items are looked up by their
position in the sequence" sense.

These days, without considering the presence or absence of any
non-dunder methods, the core distinction between sequences,
multi-dimensional arrays and arbitrary mappings really lies in the
type signature of the key parameter to__getitem__ et al (assuming a
suitably defined Index type hint):

    MappingKey = Any
    DictKey = collections.abc.Hashable
    SequenceKey = Union[Index, slice]
    ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]]

Regards,
Nick.

[1] https://github.com/ambv/typehinting/issues/171

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list