[Python-Dev] The iterator story

Neil Schemenauer nas@python.ca
Fri, 19 Jul 2002 12:00:43 -0700


Ka-Ping Yee wrote:
>     I think "for" should be non-destructive because that's the way
>     it has almost always behaved, and that's the way it behaves in
>     any other language [@] i can think of.

I agree that it can be surprising to have "for" destory the object it's
looping over.  I myself was bitten once by it.  I'm not yet sure if this
is something that will repeatedly bite.  I suspect it might. :-(

>     And as things stand, the presence of __iter__ doesn't even work [@]
>     as a type flag.

__iter__ is not a flag.  When you want to loop over an object you call
__iter__ to get an iterator.  Since you should be able to loop over all
iterators they should provide a __iter__ that returns self.

>     Now suppose we agree that __iter__ and next are distinct protocols.

I suppose you can call them distinct but they both pertain to iteration.
One gets the iterator, the other uses it.

>     Then why require iterators to support both?  The only reason we
>     would want __iter__ on iterators is so that we can use "for" [@]
>     with an iterator as the second operand.

Isn't that a good reason?  It's not just "for" though.  Anytime you have
an object that you want to loop over you should call iter() to get an
iterator and then call .next() on that object.

>     I think the potential for collision, though small, is significant,
>     and this makes "__next__" a better choice than "next".

When this issue originally came up, my position was that double
underscores should be used only if there is a risk of of namespace
collision.  The fact that the method was stored on a type slot is
irrelevant.  If objects implement iterators as a separate, specialized
object there wouldn't be any namespace collisions.  Now it looks like
people want to have iterators that also do other things.  In that case,
__next__ would have been a better choice.

>     The connection between this issue and the __iter__ issue is that,
>     if next() were renamed to __next__(), the argument that __iter__
>     is needed as a flag would also go away.

Sorry, I don't see the connection.  __iter__ is not a flag.  How does
renaming next() help?

> In my ideal world, we would allow a new form of "for", such as
> 
>     for line from file:
>         print line

Nice syntax but I think it creates other problems.  Basically, you are
saying that iterators should not implement __iter__ and we should have
some other way of looping over them (in order to make it clear that they
are being mutated).  

First, people could implement __iter__ such that it returns an iterator
the mutates the original object (e.g. a file object __iter__ that
returns xreadlines).

Second, it will be confusing to have two different ways of looping over
things.  Imagine a library with this bit of code:

    for item in sequence:
        do something

Now I want to use this library but I have an iterator, not something
that implements __iter__.  I would need to create a little wrapper with
a __iter__ method that returns my object.  Should people prefer to
write:

    for item from iterator:
        do something

when they only need to loop over something once?  Doing so makes the
code most generally useful.  What about functions like map() and max()?
Should they accept iterators or sequences as arguments?

It would be confusing if some functions accepted iterators as arguments
but not "container" objects (i.e. things that implement __iter__) and
vice versa.  People will wonder if they should call iter() before
passing their sequence as an argument.

To summarize, I agree that "for" mutating the object can be surprising.
I don't think that removing the __iter__ from iterators is the right
solution.  Unfortunately I don't have any alternative suggestions.

  Neil