[Python-Dev] Termination of two-arg iter()

Mon, 15 Jul 2002 21:10:31 -0400

[Ping]
> As a general design philosophy question, my vote would be for
> placing the burden on the implementations.  If code reuse is all
> it's cracked up to be, you're going to use the iterator more times
> than you implemented it.  Moreover, the more consistent the
> implementation is, the more widely it can be used.  (Tim just said this.)

OTOH, the less the protocol defines, the more open it is to unforeseen uses.
Tim just said that too <wink>.

> As for the specifics of the iterator protocol, there seem to be
> two separate issues here:
>
> 1.  After StopIteration, should iterators be allowed to keep going?
>
> 2.  Should an empty iterator be distinguishable from an exhausted
> iterator?
>
> For 1, i don't think i've seen anyone come down too strongly on
> the "yes" side.  There have been a couple of examples as to why
> this might be cute, but i don't think they are compelling.

I haven't seen an example of why it might useful, although I could have made
some up, and have been pleasantly surprised all along that nobody else made
one up either <wink>.  We saw a few examples illustrating that StopIteration
is in fact not sticky today, but nobody claimed such uses "were features".
Jeff Epler made one up to get clarification, and I showed a dict iter
example that demonstrated how unpredictable it can get now.

> My opinion is that, if you are trying to make an iterator keep going
> after it has stopped, it's just a way of abusing the iterator to
> represent a sequence of sequences.
>
> You can always get the behaviour you want by explicitly describing
> both kinds of sequence.  Tim's example of getting paragraphs out
> of a file demonstrates exactly why we don't want to encourage the
> abuse of one iterator to represent a sequence of sequences: you're
> going to be in trouble if you can't distinguish between the
> termination conditions for the two kinds of sequences.

That example relied on StopIteration being sticky (which it already happens
to be for the specific iter(file.readline, "") case), not on iteration doing
"something useful" after StopIteration had been raised.  A sequence is
either empty, or an element followed by a sequence.  Sticky StopIteration
makes the "empty" case at the end reliably empty, and, I think, for much the
same reason Python has always kept returning "" from file.read() after it
reaches EOF.  There's simply nothing erroneous about reaching the end of a
sequence, or about probing it again to determine emptiness instead of
carrying around fiddly flags in parallel.

> For 2, i believe Andrew and Oren want the answer to be "yes",
> but Guido and Aahz want the answer to be "no".  I think the answer
> should be "yes".  An exhausted iterator is not the same thing as
> a freshly-created iterator on an empty sequence, and allowing one
> to silently pass for the other is going to lead to problems.

I'm on the "no" side there -- an empty sequence is no more error-prone than
that range(10, 10) returns an empty list, or string[i:i] an empty string, or
that file("some_empty_file").read() returns an empty string.  An
iterator-based algorithm works on some prefix of the elements "from here
until the end":  an exhausted sequence and an empty sequence are indeed
indistinguishable from that view.  Indeed, I'm having a hard time imagining
*wanting* to distiguish the two.

> I'm not going to insist that IndexError should be the effect, as
> Guido's preference to keep IndexError for randomly-indexable
> sequences seems reasonable; anything distinguishable from
> StopIteration is fine.

OK, if we have to do this, let's call it StopIteration2 and make it a
subclass of StopIteration so my code won't have to know it exists <wink>.