[Python-ideas] Introduce collections.Reiterable

Sun Sep 22 06:56:52 CEST 2013

On 20 Sep 2013 19:49, "Steven D'Aprano" <steve at pearwood.info> wrote:
>
> On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote:
> > On 19 September 2013 22:18, Steven D'Aprano <steve at pearwood.info> wrote:
> [...]
> > > At the moment, dict views aren't directly iterable (you can't call
> > > next() on them). But in principle they could have been designed as
> > > re-iterable iterators.
> >
> > That's not what iterable means. The iterable/iterator distinction is
> > well defined and reflected in the collections ABCs:
>
> Actually, I think the collections ABC gets it wrong, according to both
> common practice and the definition given in the glossary:
>
> http://docs.python.org/3.4/glossary.html
>
> More on this below.
>
> As for my comment above, dict views don't obey the iterator protocol
> themselves, as they have no __next__ method, nor do they obey the
> sequence protocol, as they are not indexable. Hence they are not
> *directly* iterable, but they are *indirectly* iterable, since they have
> an __iter__ method which returns an iterator.

Um, no. Everywhere Python iterates over anything, we call iter(obj)
first. If there is anywhere we don't do that, it's a bug.

> I don't think this is a critical distinction. I think it is fine to call
> views "iterable", since they can be iterated over. On the rare occasion
> that it matters, we can just do what I did above, and talk about objects
> which are directly iterable (e.g. iterators, sequences, generator
> objects) and those which are indirectly iterable (e.g. dict views).

Or you could just use the existing terminology and talk about
iterables vs iterators instead of inventing your own terms.

> > * iterables are objects that return iterators from __iter__.
>
> That definition is incomplete, because iterable objects include those
> that obey the sequence protocol. This is not only by long-standing
> tradition (pre-dating the introduction of iterators, if I remember
> correctly), but also as per the definition in the glossary. Alas,
> collections.Iterable gets this wrong:
>
> py> class Seq:
> ...     def __getitem__(self, index):
> ...             if 0 <= index < 5: return index+1000
> ...             raise IndexError
> ...
> py> s = Seq()
> py> isinstance(s, Iterable)
> False
> py> list(s)  # definitely iterable
> [1000, 1001, 1002, 1003, 1004]
>
>
> (Note that although Seq obeys the sequence protocol, and is can be
> iterated over, it is not a fully-fledged Sequence since it has no
> __len__.)
>
> I think this is a bug in the Iterable ABC, but I'm not sure how one
> might fix it.

The ducktyping check could technically be expanded to use the same
fallback iter() does (i.e. __len__ and __getitem__).

However, that would reintroduce the Sequence/Mapping ambiguity that
ABCs were expressly designed to eliminate, so we don't want to do
that:

>>> class BadFallback:
...     def __len__(self):
...         return 1
...     def __getitem__(self, key):
...         if key != "the_one": raise KeyError(key)
...         return "the_value"
...
>>> c = BadFallback()
>>> c["the_one"]
'the_value'
>>> iter(c)
<iterator object at 0x7f9cf08a7f90>
>>> next(iter(c))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in __getitem__
KeyError: 0

In cases like this, the default behaviour is actually correct. Since
the fallback iterator only supports sequences rather than arbitrary
mappings, merely implementing __len__ and __getitem__ isn't considered
a reliable enough indication that an object is actually iterable.

Fortunately, we also designed the ABC system to make it trivial for
people to notify Python that their container is an iterable sequence
when the automatic ducktyping fails: they can just call register on
Iterable or one of its subclasses, and the interpreter will believe
them.

>>> from collections.abc import Iterable, Mapping
>>> isinstance(c, Iterable)
False
>>> isinstance(c, Mapping)
False
>>> Mapping.register(BadFallback)
<class '__main__.BadFallback'>
>>> isinstance(c, Iterable)
True
>>> isinstance(c, Mapping)
True

In this case, it's a bad registration, since the object in question
*doesn't* implement those interfaces properly, but it's easy to define
a type where it's more accurate:

>>> from collections import Sequence
>>> @Sequence.register
... class GoodFallback:
...     def __len__(self):
...         return 1
...     def __getitem__(self, idx):
...         if idx != 0: raise IndexError(idx)
...         return "the_entry"
...
>>> c2 = GoodFallback()
>>> list(c2)
['the_entry']
>>> isinstance(c2, Iterable)
True

Even "GoodFallback" doesn't implement the full Sequence API, but it's
likely to provide enough of it for many use cases. This is why type
checks on ABCs are vastly different to those on concrete classes -
ABCs still leave full control in the hands of the application
integrator (through explicit registrations), whereas strict interface
checks in a language like Java demand *full* interface compliance to
pass the check, even if you really only need a fraction of it.

> > That "iterators return self from __iter__" is important, since almost
> > everywhere Python iterates over something, it call "_itr = iter(obj)"
> > first.
>
> And then falls back on the sequence protocol.

And that final fallback *won't work properly* if the object in
question isn't actually a sequence.

Cheers,
Nick.