[Python-ideas] Introduce collections.Reiterable

Stephen J. Turnbull stephen at xemacs.org
Mon Sep 23 10:04:10 CEST 2013


Executive summary:

The ability to create a quick iterable with just a simple __getitem__
is cool and not a "hack" (ie, no need whatsoever to deprecate it), but
it is clearly a "consenting adults" construction (which includes
"knowing where your children are at 10pm").

Steven D'Aprano writes:

 > I agree, and I disagree with Nick's characterization of the
 > sequence protocol as a "backwards-compatibility hack". It is an
 > elegant protocol

Gotta disagree with you there (except I agree there's no need for a
word like "hack").  Because __getitem__ is polymorphic (at the
abstract level of duck-typing), this protocol is ugly.  The "must
accept 0" clause is a wart.

 > The sequence protocol allows one to write a lazily generated, 
 > potentially infinite sequence that still allows random access to items. 

Sure, but it's not fully general.  One may not *want* to write
__next__ using __getitem__.

A somewhat pathological<wink/> example is the case of Goedel numbering
of syntactically correct programs.  programs.__getitem__ can be
implemented directly by arithmetic, while programs.__next__ is best
implemented by "unrolling" the grammar.

Of course it makes sense to use an already written __getitem__ to
implement __next__ when the numerical indicies provide a semantically
useful order.  But that's already done by the Sequence ABC:

    class Squares(Sequence):        # implies mixin Iterable
        def __getitem(self, n):
            return n*n
        # __iter__ is provided as a mixin method using __getitem__
        # by Iterable

The problem is that Sequence requires a __len__ method.  OK, so

    # put this in your toolbox
    class UndefinedLengthError(TypeError):
        pass

    class InfiniteSequence(Sequence):
        def __len__(self):
            raise UndefinedLengthError

    # in programs
    from toolbox import InfiniteSequence
    class Squares(InfiniteSequence):
        def __getitem__(self, i):
            return i*i

 > Because it's infinite, there's no value that __len__ can return,
 > and no need for a __len__.

Well, it *could* return an infinite value or None, but list() isn't
prepared for that.  list() isn't even prepared for

    class Squares(object):
        def __init__(self, n):
            self.listsize = n
        def __getitem__(self, i):
            return i*i
        def __len__(self):
            return self.listsize

(It doesn't return in a sane amount of time.  I guess it goes ahead
and attempts to construct an infinite list with

    l = []
    for x in squares:
        l.append(x)

Perhaps it's a shame it doesn't detect that there's a __len__ and use
it to truncate the sequence, but most of the time it would just be
overhead, I guess.)  A lot of other functions are also going to be
upset when they get a Squares object.

This discussion is relevant because these are the kinds of things that
bothered the OP.

 > Because it supports random access to items, writing this as an
 > iterator with __next__ is inappropriate. Writing *both* is
 > unnecessary,

Incorrect, as written.  In order to iterate over a sequence (small
"s"), "somebody" has to write __next__.  It's just that the function
is generic, already written, and the compiler automatically binds it
(actually, a closure using it) to the __next__ attribute of the
automatically created iterator.  This makes it unnecessary for the
application programmer to write it.  That is indeed elegant.

 > and complicates the class for no benefit. As written, Squares is
 > naturally thread-safe -- two threads can iterate over the same
 > Squares object without interfering.

The obvious way of writing this as a generator would also be naturally
thread-safe:

    class Squares(object):
        def __iter__(self):
            n = 0
            while True:
                yield n*n
                n = n + 1

AFAICS this is faster (less function-call overhead).  In this
application it doesn't matter, but it could.  And anything where a bit
of state is useful (eg, the Fibonacci sequence) would be a lot faster
with a hand-written __iter__.



More information about the Python-ideas mailing list