Why generators take long time?

Tue Jan 19 15:19:37 EST 2016

On Tue, Jan 19, 2016 at 2:27 AM, Arshpreet Singh <arsh840 at gmail.com> wrote:

>
> I was playing with Generators and found that using Generators time is bit
> more than list-comprehensions or I am doing it wrong?
>
>
> Function with List comprehensions:
>
> def sum_text(number_range):
>     return sum([i*i for i in xrange(number_range)])
>
> %timeit sum_text(100000000)
> 1 loops, best of 3: 14.8 s per loop
>
> Using generator Expressions:
>
> def sum_text(number_range):
>
>     return sum((i*i for i in xrange(number_range)))
>
> %timeit sum_text(100000000)
>
> 1 loops, best of 3: 16.4 s per loop
>

Steven already pointed out the additional overhead in a generator
expression vs. a list comprehension.  In addition to the memory savings you
get via generator expressions, though, you can also get significant time
savings when generator expressions have the ability to short-circuit.

For instance, have a look at the following:

In [1]: import random

In [2]: %timeit all(random.random() < 0.5 for i in range(1000))
The slowest run took 4.85 times longer than the fastest. This could mean
that an intermediate result is being cached
100000 loops, best of 3: 3.57 µs per loop

In [3]: %timeit all([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 422 µs per loop

In [4]: %timeit any(random.random() < 0.5 for i in range(1000))
100000 loops, best of 3: 3.18 µs per loop

In [5]: %timeit any([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 408 µs per loop

This is using IPython with Python 3.5.  The difference here is that for
functions that short-circuit (like any and all), the generator expression
does not have to exhaust all of its elements (particularly since for each
element there's a 50-50 chance of being True or False in each case).  In
this case, the difference is a couple orders of magnitude.  The larger the
range argument is, the bigger this difference.

Also, in Python 2, the generator expression does not leak into the global
namespace, while the list comprehension does:

Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> list(i for i in range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined
>>> [i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
9

Python 3 does not leak the iterator variable in either case.  However, it
would be madness to have code actually relying on this behavior :).

At the end of the day, I use list comprehensions in the following
circumstances:

- I *know* I won't blow memory with a too-large list
- I want to iterate over the object multiple times or I want/may want
non-sequential access
- I know I want all the elements I'm creating (i.e., no chance of
short-circuiting)

I use generator expressions when

- I *might* want to

All the best,
Jason

P.S. There is a "cross-over" point where the memory requirements of the
list comp passes the generator overhead.  For instance:

In [17]: %timeit sum(i for i in range(10000000))
1 loops, best of 3: 2.08 s per loop

In [18]: %timeit sum([i for i in range(10000000)])
1 loops, best of 3: 1.86 s per loop

In [19]: %timeit sum(i for i in range(100000000))
1 loops, best of 3: 21.8 s per loop

In [20]: %timeit sum([i for i in range(100000000)])
1 loops, best of 3: 26.1 s per loop

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher