Pre-pep discussion material: in-place equivalents to map and filter

Thu Nov 3 13:00:39 EDT 2016

On Fri, 4 Nov 2016 01:05 am, Chris Angelico wrote:

> On Thu, Nov 3, 2016 at 7:29 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>>> lst = map (lambda x: x*5, lst)
>>> lst = filter (lambda x: x%3 == 1, lst)
>>> And perform especially bad in CPython compared to a comprehension.
>>
>> I doubt that.
>>
> 
> It's entirely possible. A list comp involves one function call (zero
> in Py2), but map/lambda involves a function call per element in the
> list. Function calls have overhead.

I don't know why you think that list comps don't involve function calls.

Here's some timing results using 3.5 on my computer. For simplicity, so
folks can replicate the test themselves, here's the timing code:

from timeit import Timer
setup = """data = list(range(10000))
def func(x):  # simulate some calculation
    return {x+1: x**2}
"""
t1 = Timer('[func(a) for a in data]', setup=setup)
t2 = Timer('list(map(func, data))', setup=setup)

And here's the timing results on my machine:

py> min(t1.repeat(number=1000, repeat=7))  # list comp
18.2571472954005
py> min(t2.repeat(number=1000, repeat=7))  # map
18.157311914488673

So there's hardly any difference, but map() is about 0.5% faster in this
case.

Now, it is true that *if* you can write the list comprehension as a direct
calculation in an expression instead of a function call:

    [a+1 for a in data]

*and* compare it to map using a function call:

    map(lambda a: a+1, data)

then the overhead of the function call in map() may be significant. But the
extra cost is effectively just a multiplier. It isn't going to change
the "Big Oh" behaviour. Here's an example:

t3 = Timer('[a+1 for a in data]', setup=setup)
t4 = Timer('list(map(lambda a: a+1, data))', setup=setup)
extra = """from functools import partial
from operator import add
"""
t5 = Timer('list(map(partial(add, 1), data))', setup=setup+extra)

And the timing results:

py> min(t3.repeat(number=1000, repeat=7))  # list comp with expression
2.6977453008294106
py> min(t4.repeat(number=1000, repeat=7))  # map with function
4.280411267653108
py> min(t5.repeat(number=1000, repeat=7))  # map with partial
3.446241058409214

So map() here is less than a factor of two slower. I wouldn't call
that "especially bad" -- often times, a factor of two is not important.
What really hurts is O(N**2) performance, or worse.

So at worst, map() is maybe half as fast as a list comprehension, and at
best, its perhaps a smidgen faster. I would expect around the same
performance, differing only by a small multiplicative factor: I wouldn't
expect one to be thousands or even tens of times slower that the other.

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.