[Python-ideas] Make len() usable on a generator

Steven D'Aprano steve at pearwood.info
Sat Oct 11 07:11:40 CEST 2014


On Fri, Oct 10, 2014 at 02:06:20PM -0400, random832 at fastmail.us wrote:
> On Fri, Oct 10, 2014, at 11:09, Adam Jorgensen wrote:
> > I don't think it makes much sense for len() to work on generators and the
> > fact that sum() works isn't a good argument.
> > 
> > Summing the contents of a generator can make sense whereas attempting to
> > obtain the length of something which specifically does not define a
> > length
> > seems a little nonsensical to me...
> 
> Why doesn't it define a length? No, hear me out. Is there any reason
> that, for example, generator expressions on lists or ranges shouldn't be
> read-only views instead of generators?

The important question is not "Why *doesn't* it define a length?" but 
"Why *should* it define a length?". What advantage does it give you?

Let's take the case where you are both the producer and consumer of the 
generator. You might like to write something like this:

    it = (func(x) for x in some_list)
    n = len(it) 
    consume(it, n)

But that's no better or easier than:

    it = (func(x) for x in some_list)
    n = len(some_list)
    consume(it, n)


so there is no benefit to having the generator have a length. It does no 
harm either. However, it does require a very specific special case. It 
only works when you walk directly over a fixed-length sequence, and 
can't be used in cases like these:

    it = (func(x) for x in some_list if condition(x))

    it = (func(x) for x in some_iterator_of_unpredictable_length)


So from the perspective of the producer, generators cannot always be 
given a length, and if they can, since Python can determine the length, 
so can you. There's no advantage to having the generator type do so 
that I can see.

Now consider from the perspective of a consumer of an iterator. You 
don't know where it comes from or how it is produced, so you don't know 
if it has a predictable length or not. Since you can't rely on it having 
a length, it doesn't actually give you any benefit.

Perhaps you're already writing code that supports sequences and 
iterators, with a separate branch for sequences based on the fact that 
they have a known length:

    def function(it):
        try:
            n = len(it)
        except TypeError:
            # process the cases with no known length
        else:
            # process cases with a known length


It might be nice if some generators will be processed by the "known 
length" branch, but it doesn't save you from having to write the 
"unknown length" branch. Again, the benefit is minimal or zero.

There may be cases where you *require* a predictable length, in which 
case you probably should support only sequences.

So there is no real benefit as far as I can see why generators should 
support len() even when they could.



-- 
Steven


More information about the Python-ideas mailing list