Iterator / Iteratable confusion

Francis Girard francis.girard at free.fr
Tue Feb 15 16:35:15 EST 2005


Le mardi 15 Février 2005 02:26, Terry Reedy a écrit :
> "Francis Girard" <francis.girard at free.fr> wrote in message
> news:200502142131.53265.francis.girard at free.fr...
>
> (Note for oldtimer nitpickers: except where relevant, I intentionally
> ignore the old and now mostly obsolete pseudo-__getitem__-based iteration
> protocol here and in other posts.)
>
> Le dimanche 13 Février 2005 23:58, Terry Reedy a écrit :
> >> Iterators are a subgroup of iterables. Being able to say iter(it)
> >> without
> >> having to worry about whether 'it' is just an iterable or already an
> >> iterator is one of the nice features of the new iteration design.
> >
> >I have difficulties to represent an iterator as a subspecie of an
> >iteratable
>
> You are not the only one.  That is why I say it in plain English.
>
> You are perhaps thinking of 'iterable' as a collection of things.  But in
> Python, an 'iterable' is a broader and more abstract concept: anything with
> an __iter__ method that returns an iterator.
>

Yes, I certainly do define an "iteratable" as something _upon_ which you 
iterate (i.e. a container of elements). The iterator is something that serves 
the purpose to iterate _upon_ something else, i.e. the iteratable. For me, it 
makes little sense to iterate _upon_ an iterator. The fact that, in Python, 
both iterators and iteratables must support the __iter__ method is only an 
implementation detail. Concepts must come first.

> To make iterators a separate, disjoint species then requires that they not
> have an __iter__ method.  Some problems:
> A. This would mean either
>     1) We could not iterate with iterators, such as generators, which are
> *not* derived from iterables, or, less severely

Well, generators are a bit special as they are both (conceptually) iterators 
and iteratables by their very intimate nature -- since the elements are 
_produced_ as needed, i.e. only when you do iterate.
But as for ordinary iterators, I don't see any good conceptual reason why a 
generator-iterator should support the "__iter__" method. There might be other 
reasons though (for example related with the for ... in ... construct which I 
discuss later in this reply).

>     2) We would, usually, have to 'protect'  iter() calls with either
> hasattr(it, '__iter__') or try: iter(it)...except: pass with probably no
> net average time savings.

Well, I'm not interested in time savings for now. Only want to discuss more 
conceptual issues.

> B. This would prohibit self-reiterable objects, which require .__iter__ to
> (re)set the iteration/cursor variable(s) used by .next().

To sharply distinguish in code what is conceptually different is certainly 
very good and safe design in general. But what I am thinking about would not 
_prohibit_ it. Neitheir is C++ STL prohibiting it.

> C. There are compatibility issues not just just with classes using the old
> iteration protocol but also with classes with .next methods that do *not*
> raise StopIteration.  The presence of .__iter__ cleanly marks an object as
> one following the new iterable/iterator protocol.  Another language might
> accomplish the same flagging with inheritance from a base object, but that
> is not Python.
>
(That is not C++ templates either. See below.)

Why not "__next__" (or something else) instead of "next" for iterators and, 
yes, __iter__ for iteratables ?

> > [snip]...C++ STL where there is a clear (I resist to
> > say "clean") distinction between iteratable and iterator.
>
> leaves out self-iterating iterables -- collection objects with a .next
> method.  

Nope. See the definition of an iterator in C++ STL below. Anything respecting 
the standard protocol is an iterator. It might be the container itself. The 
point is that with the standard C++ STL protocol, you are not ___obliged___ 
to define an iterator as _also_ being an  iteratable. Both concepts are 
clearly separated.

> I am sure that this is a general, standard OO idiom and not a 
> Python-specific construct.  Perhaps, ignoring these, you would prefer the
> following nomenclature:
> iterob   = object with .__iter__
> iterable= iterob without .next
> iterator = iterob with .next
>
> Does STL allow/have iterators that are *not* tied to an iterable?

Yes of course. A "forward iterator", for example is _anything_ that supports 
the following :

===================
In what follows, we shall adopt the following convention.

X : A type that is a model of Trivial Iterator 
T : The value type of X 
x, y, y : Object of type X 
t : Object of type T 

Copy constructor : X(x) ------> X
Copy constructor : X x(y); or  X x = y;
Assignment : x = y [1]  ------> X& 
Swap : swap(x,y)  ------> void 

Equality : x == y ------> Convertible to bool
Inequality : x != y ------> Convertible to bool

Default constructor  : X x or X() ------> X
Dereference : *x  ------> Convertible to T 
Dereference assignment : *x = t ------> X is mutable
Member access : x->m [2] ------> T is a type for which x.m is defined 
======================

Anything that respects this convention is  a forward iterator. They might 
produce their own content as we iterate upon them if that's what is needed. 
They don't have to be a class inheriting from some another standard class. 
That's the beauty of C++ templates, forgetting (but not forgiving) its very 
ugly and complex syntax.

>
> >One of the result of not distinguishing them is that, at some point in
> >your
> >programming, you are not sure anymore if you have an iterator or an
> >iteratable ; and you might very well end up calling "iter()" or
> > "__iter__()" everywhere.
>
> If you iterate with a while loop just after creating a new iterable or
> iterator, then you probably do know which it is and can make the iter()
> call only if needed.  

Yes, true. But then you start factorize some code, and, then, oh well, you can 
see the difficulties.

> If you while-iterate with a function argument, then 
> iter() is a simpler way to be generic than the alternatives in A2 above.
>

You need that genereticity chiefly because you want to support both, iterators 
and iteratables in some argument place. I don't see any good reasons for this 
except historical ones (see for ... in ... constructs below). It might be 
preferable for a method to only accepts iterator if the only thing the method 
has to do with the iteratable is to iterate over it, and let the client of 
the method call "__iter__" or "iter()" if all he has is an iteratable (i.e. a 
container).


> >I am not concerned with the small performance issue involved here
>
> Good.  I think there are small.  The number and time for iterations is far
> more important.
>
> > (as I very seldom am) but with clarity. After all, why should you have to
> >
> >  > call __iter__ on an iterator you just constructed
>
> As I said above, you don't, and most people wouldn't.  The function
> implementing for loops does because *it*, unlike you, only sees the object
> passed and not the code that created the object!
>

As I said above, why should the function implementing for loops should accept 
anything other than an iterator (not iteratable), except, maybe, for 
historical reasons ?

> >I have a strong feeling that the problem arises from the difficulty to
> > marry the familiar ""for ... in ..."" construct with iterators. [snip]
>
> What difficulty?  For loops accept an iterable and iterate with the derived
> iterator.
>

The second argument place of the "for ... in ..." construct, _before_ the 
iteration protocol, had to be a container. At that time, it was the "for ... 
in ..." syntax construct that palyed the conceptual roles of both iteration 
and iterator. Now, the "for ... in ..." construct is the standard way to do 
iterate. Therefore, the second argument place of the "for ... in ..." must 
also be an iterator if we have to introduce iterators to Python. The question 
must had been, at the time iterators were introduced, "how should we manage 
to have these two different things in the same argument place ?" I think the 
Python solution to his dilemma is very acceptable. It is certainly the python 
way to be very polymorphic. But, at the same time, I can very well understand 
that a lot of person do have difficulties to swallow that an iterator should 
be a subspecie of an iteratable.


> Would you really prohibit the use of for loops with generators and other
> non-iterable-derived iterators?  See A1 above.

Of course not. This question reveals the difficulty I just pointed out.

>
> >To have iterators act as iteratables might very well had been a
> >
> > >compromise to solve the problem.
>
> I think it elegant.  See below.
>
> >I am not sure at all that this is a "nice feature" to consider an iterator
> >at
> >the same level that an iteratable.
>
> I think you are too stuck in the STL model.

Yes, very true. I just can't help thinking that iterators and iteratables are 
very different concepts.

>
> > It makes it a bit more akward to have the
> >"mind impulse", so to speak, to build iterators on top of other iterators
> >to > slightly modify the way iteration is done.
>
> On the contrary, what could be more elegant than
> def itermodifier(it):
>   for i in it: # where it is often an iterator
>      yield modification-of-i
> See the itertools module and docs and the examples of chaining iterators.
>

Yes, I certainly do think that generators are very, very, very elegant. I came 
back to python a lot because of their beauty, coupled with all the other 
advantages Python has to offer : simple syntax, well thought libraries, easy 
portability, pragmaticism, etc. (there's a long list).

On the other hand, I certainly do notice that the itertools module 
documentation says that all the functions defined there returns an iterator 
(not iteratable) but accepts iteratable. There is something akward about it 
for the beginner. You always have to re-think, oh ! yes ! they refer to the 
protocol, not the concepts.

But this is not that bad and I can certainly live with it.

(Thank you if you had the courage to read it all !)

Regards

Francis Girard

> Terry J. Reedy




More information about the Python-list mailing list