dict generator question

Mon Sep 22 19:16:45 EDT 2008

On Mon, 22 Sep 2008 04:21:12 -0700, bearophileHUGS wrote:

> Steven D'Aprano:
> 
>>Extending len() to support iterables sounds like a good idea, except
>>that it's not.<
> 
> Python language lately has shifted toward more and more usage of lazy
> iterables (see range lazy by default, etc). So they are now quite
> common. So extending len() to make it act like leniter() too is a way to
> adapt a basic Python construct to the changes of the other parts of the
> language.

I'm sorry, I don't recognise leniter(). Did I miss something?

> In languages like Haskell you can count how many items a lazy sequence
> has. But those sequences are generally immutable, so they can be
> accessed many times, so len(iterable) doesn't exhaust them like in
> Python. So in Python it's less useful.

In Python, xrange() is a lazy sequence that isn't exhausted, but that's a 
special case: it actually has a __len__ method, and presumably the length 
is calculated from the xrange arguments, not by generating all the items 
and counting them. How would you count the number of items in a generic 
lazy sequence without actually generating the items first?

> This is a common situation where I can only care of the len of the g
> group:
> [leniter(g) for h,g in groupby(iterable)]
> 
> There are other situations where I may be interested only in how many
> items there are:
> leniter(ifilter(predicate, iterable)) leniter(el for el in iterable if
> predicate(el))
> 
> For my usage I have written a version of the itertools module in D (a
> lot of work, but the result is quite useful and flexible, even if I miss
> the generator/iterator syntax a lot), and later I have written a len()
> able to count the length of lazy iterables too (if the given variable
> has a length attribute/property then it returns that value), 

I'm not saying that no iterables can accurately predict how many items 
they will produce. If they can, then len() should support iterables with 
a __len__ attribute. But in general there's no way of predicting how many 
items the iterable will produce without iterating over it, and len() 
shouldn't do that.

> and I have
> found that it's useful often enough (almost as the string.xsplit()). But
> in Python there is less need for a len() that counts lazy iterables too
> because you can use the following syntax that isn't bad (and isn't
> available in D):
> 
> [sum(1 for x in g) for h,g in groupby(iterable)] sum(1 for x in
> ifilter(predicate, iterable)) sum(1 for el in iterable if predicate(el))

I think the idiom sum(1 for item in iterable) is, in general, a mistake. 
For starters, it doesn't work for arbitrary iterables, only sequences 
(lazy or otherwise) and your choice of variable name may fool people into 
thinking they can pass a use-once iterator to your code and have it work.

Secondly, it's not clear what sum(1 for item in iterable) does without 
reading over it carefully. Since you're generating the entire length 
anyway, len(list(iterable)) is more readable and almost as efficient for 
most practical cases.

As things stand now, list(iterable) is a "dangerous" operation, as it may 
consume arbitrarily huge resources. But len() isn't[1], because len() 
doesn't operate on arbitrary iterables. This is a good thing.

> So you and Python designers may choose to not extend the semantics of
> len() for various good reasons, but you will have a hard time convincing
> me it's a useless capability :-)

I didn't say that knowing the length of iterators up front was useless. 
Sometimes it may be useful, but it is rarely (never?) essential.

[1] len(x) may call x.__len__() which might do anything. But the expected 
semantics of __len__ is that it is expected to return an int, and do it 
quickly with minimal effort. Methods that do something else are an abuse 
of __len__ and should be treated as a bug.

-- 
Steven