[Python-ideas] Introduce collections.Reiterable
Stephen J. Turnbull
stephen at xemacs.org
Mon Sep 23 10:04:10 CEST 2013
Executive summary:
The ability to create a quick iterable with just a simple __getitem__
is cool and not a "hack" (ie, no need whatsoever to deprecate it), but
it is clearly a "consenting adults" construction (which includes
"knowing where your children are at 10pm").
Steven D'Aprano writes:
> I agree, and I disagree with Nick's characterization of the
> sequence protocol as a "backwards-compatibility hack". It is an
> elegant protocol
Gotta disagree with you there (except I agree there's no need for a
word like "hack"). Because __getitem__ is polymorphic (at the
abstract level of duck-typing), this protocol is ugly. The "must
accept 0" clause is a wart.
> The sequence protocol allows one to write a lazily generated,
> potentially infinite sequence that still allows random access to items.
Sure, but it's not fully general. One may not *want* to write
__next__ using __getitem__.
A somewhat pathological<wink/> example is the case of Goedel numbering
of syntactically correct programs. programs.__getitem__ can be
implemented directly by arithmetic, while programs.__next__ is best
implemented by "unrolling" the grammar.
Of course it makes sense to use an already written __getitem__ to
implement __next__ when the numerical indicies provide a semantically
useful order. But that's already done by the Sequence ABC:
class Squares(Sequence): # implies mixin Iterable
def __getitem(self, n):
return n*n
# __iter__ is provided as a mixin method using __getitem__
# by Iterable
The problem is that Sequence requires a __len__ method. OK, so
# put this in your toolbox
class UndefinedLengthError(TypeError):
pass
class InfiniteSequence(Sequence):
def __len__(self):
raise UndefinedLengthError
# in programs
from toolbox import InfiniteSequence
class Squares(InfiniteSequence):
def __getitem__(self, i):
return i*i
> Because it's infinite, there's no value that __len__ can return,
> and no need for a __len__.
Well, it *could* return an infinite value or None, but list() isn't
prepared for that. list() isn't even prepared for
class Squares(object):
def __init__(self, n):
self.listsize = n
def __getitem__(self, i):
return i*i
def __len__(self):
return self.listsize
(It doesn't return in a sane amount of time. I guess it goes ahead
and attempts to construct an infinite list with
l = []
for x in squares:
l.append(x)
Perhaps it's a shame it doesn't detect that there's a __len__ and use
it to truncate the sequence, but most of the time it would just be
overhead, I guess.) A lot of other functions are also going to be
upset when they get a Squares object.
This discussion is relevant because these are the kinds of things that
bothered the OP.
> Because it supports random access to items, writing this as an
> iterator with __next__ is inappropriate. Writing *both* is
> unnecessary,
Incorrect, as written. In order to iterate over a sequence (small
"s"), "somebody" has to write __next__. It's just that the function
is generic, already written, and the compiler automatically binds it
(actually, a closure using it) to the __next__ attribute of the
automatically created iterator. This makes it unnecessary for the
application programmer to write it. That is indeed elegant.
> and complicates the class for no benefit. As written, Squares is
> naturally thread-safe -- two threads can iterate over the same
> Squares object without interfering.
The obvious way of writing this as a generator would also be naturally
thread-safe:
class Squares(object):
def __iter__(self):
n = 0
while True:
yield n*n
n = n + 1
AFAICS this is faster (less function-call overhead). In this
application it doesn't matter, but it could. And anything where a bit
of state is useful (eg, the Fibonacci sequence) would be a lot faster
with a hand-written __iter__.
More information about the Python-ideas
mailing list