Dangerous behavior of list(generator)

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Jan 1 09:42:29 EST 2010


On Fri, 01 Jan 2010 05:19:02 -0800, Wolfram Hinderer wrote:

> On 1 Jan., 02:47, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Thu, 31 Dec 2009 11:34:39 -0800, Tom Machinski wrote:
[...]
>> > As for what's wrong with the "if not any" solution, Benjamin Kaplan's
>> > post hits the nail on its head. This is a bioinformatics application,
>> > so the iterable "foos" tends to be very large, so saving half the
>> > runtime makes a big difference.
>>
>> Possibly you haven't seen my reply to Benjamin, so I'll paraphrase:
>> that's incorrect, because any() is lazy and will return as soon as it
>> hits a non-false item.
> 
> Tom's point is that
> if not any(foo for foo in foos if foo.bar):
>     foo = (foo for foo in foos if foo.bar).next()
> iterates twice over (the same first few elements of) foos, which should
> take about twice as long as iterating once. The lazyness of "any" does
> not seem to matter here.

That's no different from any "Look Before You Leap" idiom. If you do this:

if key in dict:
    x = dict[key]

you search the dict twice, once to see if the key is there, and the 
second time to fetch the value. Whether that is better or faster than the 
alternative:

try:
    x = dict[key]
except KeyError:
    pass

depends on how often you expect the lookup to fail.

In any case, I would claim that Tom's argument is a classic example of 
premature optimization: by his own admission: 

'the iterable "foos" tends to be very large'

which implies that whatever happens to the foos after this test, it will 
probably be very time consuming. If it takes (for the sake of the 
argument) 10 milliseconds to process the entire iterable, who cares 
whether it takes 0.01 or 0.02 ms to check that the iterable is valid?



> Of course, you're right that the iteration might or might not be the
> bottleneck. On the other hand, foos might not even be reiterable.

If that's the case, then the existing solution potentially throws away 
the first value of foos every time the caller tests to see if it is empty.

Dealing with non-reiterable iterators can be a nuisance. In such a case, 
it may be best to avoid Look Before You Leap altogether:

empty = True
for foo in foos:
    if foo.bar:
        empty = False
        process(foo)
if empty:
    handle_error_condition()




-- 
Steven



More information about the Python-list mailing list