[Python-Dev] The iterator story

Ka-Ping Yee ping@zesty.ca
Sat, 20 Jul 2002 05:32:41 -0700 (PDT)


If you only have ten seconds read this:
---------------------------------------

Guido, i believe i understand your position.  My interpretation is:

    I'd like "iterate destructively" and "iterate non-destructively"
    to be spelled differently.  You don't.

    I'd like to be able to establish conventions so that "x in y"
    doesn't destroy y.  This isn't so important to you.

We have a difference of opinion.  I don't think we have a failure in
understanding.  If the opinions won't change, we might as well move on.
I did not mean to waste your time, only to achieve understanding.


Actual reply follows:
---------------------

On Fri, 19 Jul 2002, Guido van Rossum wrote:
> But I note that there are hybrids, and I think files (at least
> seekable files) fall in the hybrid category.

Indeed, files are unusual.  In the particular way that i've chosen
my definitions, though, classification of files is clear: files
are not containers (there's no non-mutating read) and files are
iterators (due to the behaviour of the read() method).

Files aside, i do agree that hybrids exist.  The dbm and tree examples
you gave indeed mix container and iterator behaviour.  I agree with
you that mixing these things isn't usually a good design.

In some cases you do end up providing both container-like and
iterator-like interfaces.  This is fine.  But then when you use the
object, you ought to be able to know which interface you are using.

The argument in the "iterator story" message is that we should have
a way to say "i want to use the non-destructive interface" and a
way to say "i want to use the destructive interface".  Depending what
makes sense, one can choose to implement either interface, or both.

> For example, while a tape file is a
> container in the sense that reading the data doesn't destroy it, it's
> very heavily geared towards sequential access, and you can't
> realistically have two iterators going over the same tape at once.

Indeed, you can't.  But a tape file object is not a container (if we're
using my definition), because the act of reading changes the tape file
object -- it advances the tape.  It's the same as file.read() -- even
though file.read() doesn't mutate the data on the disk, it does mutate
the file object, and that is what makes the file object not a container.

It's precisely because tapes are too slow for practical random access
that we would want a tape file object to provide an iterator-style
interface and not provide a container-style interface.

> If you're too young to remember

Hee hee.  I've used tapes.  I've used *cassette* tapes, even. :)

> >     The issue is, should "for" be non-destructive?
>
> I don't see the benefit.  We've done this for years and the only
> conceptual problem was the abuse of __getitem__, not the
> destructiveness of the for-loop.
[...]
> >     The issue is, should "in" be non-destructive?
>
> If it can't be helped otherwise, sure, why not?

Obviously we see these "problems" differently.  Having "x in y"
possibly destroy y is scary to me, but no big deal to you.  All right.

> >     still produces "KeyError: 0"!  This oughta be fixed...)
>
> Check the CVS logs.  At one point before 2.2 was released, UserDict
> has a __iter__ method.  But then SF bug 448153 was filed, presenting
> evidence that this broke previously working code.  So a separate
> class, IterableUserDict, was added that has the __iter__ method.

Oh.  :(   Okay.  Thanks for explaining.

> There are a lot of objects that
> have a way to return an iterators (old style using fake __getitem__,
> and new ones using __iter__ and next) that are intended to be looped
> over, once.  I have no desire to deprecate this behavior, since (a) it
> would be a major upheaval for the user community (a lot worse than
> integer division), and (b) I don't see that "fixing" this prevents a
> particular category of programming errors.

As you can tell by now, i think it does prevent a certain category
of errors.  The general description is "mixing up mutating and
non-mutating interfaces".  The closest analogy i can think of is
an alternate world in which "+" and "+=" had the same name, and the
only way you could tell if the left operand would get mutated is
by knowing the implementation of the left-hand object at runtime.

Of course, in real Python you have to trust that the implementation
"+" does not mutate.  But at least we are able to set a convention,
because "+" and "+=" are distinct operators.  In the weird alternate
world where "+" and "+=" are both written "+", you would have no
hope of telling the difference.  We'd look at "x + y" and say
"Will x change?  I don't know."

And so it is with "for x in y": we'd look at that and say "Will y
change?  I don't know."  We have no way of telling whether y is a
container or an iterator, thus no way to establish a convention
about what this should do.  "for x in y" is polymorphic on y, but
this is not how i think polymorphism is supposed to work.

You could say you don't care whether y changes.  (Well, you *are*
saying you don't care.)  Well, okay.  I just want to make sure we both
understand each other and see the issue at hand.  If we do, then it
just comes down to a difference of opinion about how significant a
mixup this is, and so be it.

> >     I believe __iter__ is not a type flag.
[...]
> And I never said it was a type flag.  I'm tired of repeating myself,
> but you keep repeating this broken argument, so I have to keep
> correcting you.

I know you didn't say this.  Please don't be offended.  I apologize
if i seemed to be wilfully ignoring you -- you don't have to repeat
things many times in order to "drive home" your position to me.
I was trying to summarize all the positions (not just yours),
organize them, and explain them all at once.


-- ?!ng