[Python-ideas] Introduce collections.Reiterable
Steven D'Aprano
steve at pearwood.info
Fri Sep 20 11:48:58 CEST 2013
On Thu, Sep 19, 2013 at 11:02:57PM +1000, Nick Coghlan wrote:
> On 19 September 2013 22:18, Steven D'Aprano <steve at pearwood.info> wrote:
[...]
> > At the moment, dict views aren't directly iterable (you can't call
> > next() on them). But in principle they could have been designed as
> > re-iterable iterators.
>
> That's not what iterable means. The iterable/iterator distinction is
> well defined and reflected in the collections ABCs:
Actually, I think the collections ABC gets it wrong, according to both
common practice and the definition given in the glossary:
http://docs.python.org/3.4/glossary.html
More on this below.
As for my comment above, dict views don't obey the iterator protocol
themselves, as they have no __next__ method, nor do they obey the
sequence protocol, as they are not indexable. Hence they are not
*directly* iterable, but they are *indirectly* iterable, since they have
an __iter__ method which returns an iterator.
I don't think this is a critical distinction. I think it is fine to call
views "iterable", since they can be iterated over. On the rare occasion
that it matters, we can just do what I did above, and talk about objects
which are directly iterable (e.g. iterators, sequences, generator
objects) and those which are indirectly iterable (e.g. dict views).
> * iterables are objects that return iterators from __iter__.
That definition is incomplete, because iterable objects include those
that obey the sequence protocol. This is not only by long-standing
tradition (pre-dating the introduction of iterators, if I remember
correctly), but also as per the definition in the glossary. Alas,
collections.Iterable gets this wrong:
py> class Seq:
... def __getitem__(self, index):
... if 0 <= index < 5: return index+1000
... raise IndexError
...
py> s = Seq()
py> isinstance(s, Iterable)
False
py> list(s) # definitely iterable
[1000, 1001, 1002, 1003, 1004]
(Note that although Seq obeys the sequence protocol, and is can be
iterated over, it is not a fully-fledged Sequence since it has no
__len__.)
I think this is a bug in the Iterable ABC, but I'm not sure how one
might fix it.
> * iterators are the subset of iterables that return "self" from
> __iter__, and expose a next (2.x) or __next__ (3.x) method
That is certainly correct. All iterators are iterables, but not all
iterables are iterators.
> That "iterators return self from __iter__" is important, since almost
> everywhere Python iterates over something, it call "_itr = iter(obj)"
> first.
And then falls back on the sequence protocol.
> So, my question is a genuine one. While, *in theory*, an object can
> define a stateful __iter__ method that (e.g.) only works the first
> time it is called, or returns a separate object that still stores it's
> "current position" information on the original container, I simply
> can't think of a non-pathological case where "isinstance(obj,
> Iterable) and not isinstance(obj, Iterator)" would give the wrong
> answer.
>
> In theory, yes, an object could obviously pass that test and still not
> be Reiterable, but I'm interested in what's true in *practice*.
I don't think you and I are actually in disagreement here. This is
Python, and one could write an iterator class that is reiterable, or an
iterable object (as determined by isinstance) which cannot be iterated
over, but I think we can dismiss them as pathological cases. Even if
such unusual objects are useful, it is the caller's responsibility, not
the callee's, to use them safely and appropriately with functions that
are expecting them.
--
Steven
More information about the Python-ideas
mailing list