[Python-ideas] Make len() usable on a generator

Steven D'Aprano steve at pearwood.info
Sat Oct 4 15:27:41 CEST 2014


On Fri, Oct 03, 2014 at 05:09:20PM +0200, Thomas Chaumeny wrote:
> Hi!
> 
> I have just come across some code counting a generator comprehension
> expression by doing len([foo for foo in bar if some_condition]) and I
> realized it might be better if we could just use len(foo for foo in bar if
> some_condition) as it would avoid a list allocation in memory.
> 
> Another possibility is to write sum(1 for foo in bar if some_condition),
> but that is not optimal either as it generates a lot of intermediate
> additions which should not be needed.

I don't understand this reasoning. Why do you think that they are 
unnecessary?

I believe that, in the general case of an arbitrary generator 
expression, there are only two ways to tell what the length will be. The 
first is to produce a list (or other sequence), then return the 
length of the list:  len([obj for obj in generator if condition]). The 
second is to count each item as it is produced, but without storing them 
all: sum(1 for obj in generator if condition). The first is optimized 
for readability, the second for memory.

If I have missed a third way, please tell me.

I don't believe that there is any other general way to work out the 
length of an arbitrary generator. (Apart from trivial, or obfuscated, 
variations on the above two, of course.) How would you tell what the 
length of this generator should be, apart from actually running it to 
exhaustion?

def generator():
    while True:
        if random.random() < 0.5: return
        yield "spam"

Since there is no general way to know what the length of an arbitrary 
generator will be, it is better to be explicit that it has to be 
calculated by running through the generator and exhausting it.



-- 
Steven


More information about the Python-ideas mailing list