[Python-Dev] Single- vs. Multi-pass iterability

Alex Martelli aleax@aleax.it
Wed, 17 Jul 2002 18:08:51 +0200


On Tuesday 16 July 2002 03:50 pm, Guido van Rossum wrote:
	...
> I dunno.  The presence of seek() and write() makes the behavior of
> files a rather unique blend of iterator and iterable.

All files have seek and write, but not on all files do they work -- and
the same goes for iteration.  I.e., it IS something of a mess, probably
because the file object's is the only example of "fat interface" problem
in Python -- an interface that exposes a lot of methods, with many
objects claiming they implement that interface but actually lying
(because they only implement a subset of it -- trying to use methods
they can't in fact provide raises exceptions).

The galaxy of Microsoft interfaces based on COM has sadly many
fat interfaces and it IS the worst mess with that galaxy.

Anyway, a rewindable-iterator is not an iterable in any case.  You
can't have two nested loops on it -- that's crucial.  Making a file
into an iterable requires wrapping it with a class that caches it.

If and when rewindable iterators are recognized as such by Python,
files whose seek(0) method doesn't raise will make a fine example.

But iterables, they ain't, just like rewindable iterators in general aren't.


> > I don't see any downside to having this micro-wart removed.  In
> > particular, I don't see what's confusing.  Things that respond to
> > iter(x) fall in two categories:
> >     iterators: also have x.next(), and iter(x) is x
> >     iterables: iter(x) is not x, so you can presumably get another
> >         iterator out of x at some later point in time if needed.
> > It's not QUITE as simple as this, but moving file objects from
> > the second category to the first seems to _simplify_ things a bit.
>
> I worry that equating a file with its iterable makes it more likely
> that people mix next() with readline() or seek(), which doesn't work
> (at least not until the I/O system is rewritten).

It's exactly to DISTINGUISH a file from "its iterable" (which it does
not have) that I'd like files to be iterators, NOT fake iterables.

f.seek does cooperate with f.next now, doesn't it?  since it
invalidates f's xreadlines object, if any?

> I'd be more comfortable with teaching people that you should *either*
> use a file in a for loop (the common case, probably) *or* use its
> native I/O methods (readline() etc.), but not mix both.

Fine (I think BOTH cases are very common), although it will probably
be handier one day if/when the I/O system is indeed rewritten.  But
having "iter(f) is f" isn't really germane to this issue.


> > E.g.:
> >
> > def useIterable(x):
> >     try:
> >         it = iter(x)
> >     except TypeError:
> >         raise TypeError, "Need iterable object, not %s" % type(x)
> >     if it is x:
> >         raise TypeError, "Need iterable object, not iterator"
> >     # keep happily using it and/or x as needed, and in particular
> >     # the code is able to call it1 = iter(x) if it needs to iterate again
> >
> > Not perfect -- but having a file-object argument fail this simplistic
> > test seems better to me, less confusing, than having it pass.
>
> This actually looks like an example of the "look before you leap"
> (LBYL) syndrome, which you disapproved of recently.

Only if you don't look carefully enough.  It uses try/except when
it can (just to change the exception's contents -- probably might
as well not bother and just do it=iter(x) without a try), it uses
a guarded raise statement when it must, because there's no way
it could get an exception out of the case it can't handle.

Consider, by analogy:

def loopUntilConvergence(f, x, epsilon):
    y = f(x)
    while abs(x-y) > epsilon:
        x = y
        y = f(x)
    return y

Now what happens if you mistakenly pass epsilon<0?  Oops -- an
infinite loop.  So, one may add:
    if epsilon<0: raise ValueError, "Need epsilon>=0, not %s" % epsilon

Is this an example of erroneous use of LBYL rather than EAFP?  No,
because no exception would be raised by the infinite loop, so there is
no alternative to doing the checks.

In exactly the same way, there is no alternative to checking in
useIterable, because there is no exception one could count on --
rather, we'd have a case of an error passing silently.

In other words: that EAFP is preferable to LBYL does NOT mean
that one should NEVER use:
    if whatever: raise something
because certain error conditions do reveal themselves only in
ways testable with an if, NOT by raising exceptions themselves.

And some you can't even test with an if, and then you're in
trouble (e.g., in loopUntilConvergence, nothing assures us that
f and the initial x ARE such as to converge -- so, one would
further have a maximum-iteration-count argument, defaulting to
something suitably big, count iterations, and do something of
a look-AFTER-you've-leaped to raise on non-iteration:-).


This doesn't have all that much to do with file objects being
or not being iterators, but I love rambling discussions anyway:-).


Alex