[Python-ideas] while conditional in list comprehension ??

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Wed Jan 30 10:46:31 CET 2013


Yuriy Taraday <yorik.sar at ...> writes:

> 
> 
> On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> list(i for i in range(100) if i<50 or stop())
> Really (!) nice (and 2x as fast as using itertools.takewhile())!
> 
> 
> 
> I couldn't believe it so I had to check it:
> 
> 
> from __future__ import print_function
> 
> import functools, itertools, operator, timeit
> 
> def var1():
>     def _gen():
>         for i in range(100):
>             if i > 50: break
>             yield i
> 
>     return list(_gen())
> 
> def var2():
>     def stop():
>         raise StopIteration
>     return list(i for i in range(100) if i <= 50 or stop())
> 
> 
> def var3():
>     return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]
> 
> def var4():
>     return [i for i in itertools.takewhile(functools.partial(operator.lt, 50),
range(100))]
> 
> 
> if __name__ == '__main__':
>     for f in (var1, var2, var3, var4):
>         print(f.__name__, end=' ')
>         print(timeit.timeit(f))
> 
> 
> 
> Results on my machine:
> 
> 
> var1 20.4974410534
> var2 23.6218020916
> var3 32.1543409824
> var4 4.90913701057
> 
> var1 might have became the fastest of the first 3 because it's a special and
very simple case. Why should explicit loops be slower that generator expressions?
> 
> var3 is the slowest. I guess, because it has lambda in it.
> But switching to Python and back can not be faster than the last option -
sitting in the C code as much as we can.
> 
> 
> -- Kind regards, Yuriy.

Steven D'Aprano <steve at ...> writes:

> 
> On 30/01/13 02:44, Wolfgang Maier wrote:
> 
> > list(i for i in range(100) if i<50 or stop())
> > Really (!) nice (and 2x as fast as using itertools.takewhile())!
> 
> I think you are mistaken about the speed. The itertools iterators are highly
> optimized and do all their work in fast C code. If you are seeing takewhile
> as slow, you are probably doing something wrong: untrustworthy timing code,
> misinterpreting what you are seeing, or some other error.
> 
> Here's a comparison done the naive or obvious way. Copy and paste it into an
> interactive Python session:
> 
> from itertools import takewhile
> from timeit import Timer
> 
> def stop(): raise StopIteration
> 
> setup = 'from __main__ import stop, takewhile'
> 
> t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup)
> t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup)
> 
> min(t1.repeat(number=100000, repeat=5))
> min(t2.repeat(number=100000, repeat=5))
> 
> On my computer, t1 is about 1.5 times faster than t2. But this is misleading,
> because it's not takewhile that is slow. I am feeding something slow into
> takewhile. If I really need to run as fast as possible, I can optimize the
> function call inside takewhile:
> 
> from operator import lt
> from functools import partial
> 
> small_enough = partial(lt, 50)
> setup2 = 'from __main__ import takewhile, small_enough'
> 
> t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2)
> 
> min(t3.repeat(number=100000, repeat=5))
> 
> On my computer, t3 is nearly 13 times faster than t1, and 19 times faster
> than t2. Here are the actual times I get, using Python 2.7:
> 
> py> min(t1.repeat(number=100000, repeat=5))  # using the StopIteration hack
> 1.2609241008758545
> py> min(t2.repeat(number=100000, repeat=5))  # takewhile and lambda
> 1.85182785987854
> py> min(t3.repeat(number=100000, repeat=5))  # optimized version
> 0.09847092628479004
> 

Hi Yuriy and Steven,
a) I had compared the originally proposed 'takewhile with lambda' version to the
'if cond or stop()' solution using 'timeit' just like you did. In principle, you
find the same as I did, although I am a bit surprised that our differences are
different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than
'takewhile with lambda'.

b) I have to say I was very impressed by the speed gains you report through the
use of 'partial', which I had not thought of at all, I have to admit.
However, I tested your suggestions and I think they both suffer from the same
mistake:
your condition is 'partial(lt,50)', but this is not met to begin with and
results in an empty list at least for me. Have you two actually checked the
output of the code or have you just timed it? I found that in order to make it
work the comparison has to be made via 'partial(gt,50)'. With this modification
the resulting list in your example would be [0,..,49] as it should be.

And now the big surprise in terms of runtimes:
partial(lt,50) variant:     1.17  (but incorrect results)
partial(gt,50) variant:    13.95
if cond or stop() variant:  9.86

I guess python is just smart enough to recognize that it compares against a
constant value all the time, and optimizes the code accordingly (after all the
if clause is a pretty standard thing to use in a comprehension).

So the reason for your reported speed-gain is that you actually broke out of the
comprehension at the very first element instead of going through the first 50!
Please comment, if you get different results.
Best,
Wolfgang








More information about the Python-ideas mailing list