Pre-pep discussion material: in-place equivalents to map and filter

Fri Nov 4 20:42:48 EDT 2016

On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote:

[...]
> List comps themselves involve one function call (zero in Py2). What
> you do inside the expression is your business. Do you agree that list
> comps don't have the overhead of opening and closing files?

/tongue firmly in cheek

I'd like to see you run a Python script containing a list comprehension
without opening and closing the .py file.

:-P

Okay, I see what you are getting at now: in CPython 3, list comprehensions
are implemented in such a way that the list comprehension requires a
minimum of one function call, while list(map(func, iterable))) requires a
minimum of O(N)+2 function calls: a call to list, a call to map, and a call
to func for each of N values.

That's *technically* true, but it is an implementation detail. I'm sure that
some day PyPy or Nuitka or maybe even CPython itself will start in-lining
at least some functions (if PyPy doesn't already do so). That would be an
obvious optimization to apply to map() and filter().

As far as the speed of map() versus list comps, my micro-benchmarks show
that at least on my computer, map() can be marginally but consistently
faster than a list comp once you equalise that cost of function calls, that
is, if the list comp explicitly calls a function. I don't think that speed
difference is significant, so let's just call them "equally fast" when
comparing similar cases:

    [func(obj) for obj in iterable]

    map(func, iterable)

But of course you're right that list comprehensions give you the opportunity
to avoid that function call -- at least in CPython. I already agreed with
that, but to emphasise what I've already agreed, I'll say it again :-)

If you can manually in-line func() as a single expression inside the list
comprehension:

    [(spam + len(obj.eggs))*2 for obj in iterable]

then you can expect to save the cost of N function calls, which may be
significant. As I said earlier, that's why we have list comps.

(But on the other hand, if the expression is expensive enough, the cost of
an extra function call may be utterly insignificant.)

The point that I am trying to make is that none of these facts justifies the
claim that map() performs "especially bad" compared to list comprehensions.
According to my tests, *at worst* map() will be a bit better than half as
fast as a list comprehension, and at best just as fast if not slightly
faster.

>> Here's some timing results using 3.5 on my computer. For simplicity, so
>> folks can replicate the test themselves, here's the timing code:
>>
>>
>> from timeit import Timer
>> setup = """data = list(range(10000))
>> def func(x):  # simulate some calculation
>>     return {x+1: x**2}
>> """
>> t1 = Timer('[func(a) for a in data]', setup=setup)
>> t2 = Timer('list(map(func, data))', setup=setup)
> 
> This is very different from the original example, about which the OP
> said that map performs badly, and you doubted it.

I didn't read the OP as making a specific claim about these two *specific*
map and filter examples:

    lst = map (lambda x: x*5, lst)
    lst = filter (lambda x: x%3 == 1, lst)

I read these as mere examples of a general claim that map and
filter "perform especially bad in CPython compared to a comprehension".

But just for the exercise, I repeated my benchmarks with these specific
examples, comparing:

    list(map(lambda x: x*5, data))
    [x*5 for x in data]

and 

    list(filter(lambda x: x%3 == 1, data))
    [x for x in data if x%3 == 1]

and again got a roughly factor of two performance difference, with the list
comp being faster. I don't think that justifies the claim of "especially
bad", which to me implies something much worse. If you're going to describe
a mere factor of two as "especially bad", what words do we have left for
something that is ten thousand times slower?

As the wisest man in the universe once said, hyperbole is the most terrible,
awful crime against humanity.

*wink*

[...]
> Thing is, this is extremely common. How often do you actually use a
> comprehension with something that is absolutely exactly a function
> call on the element in question?

"This" being something that can be in-lined in the body of the list comp.

Sure. I cheerfully acknowledge that list comps where you can write an
in-line expression are very common. That's the beauty of list comps!

[...]
> But this conclusion I agree with. There is a performance difference,
> but it is not overly significant. Compared to the *actual work*
> involved in the task (going through one list and doing some operation
> on each operation), the difference between map and a comprehension is
> generally going to be negligible.

I wouldn't go quite so far as to say "negligible" -- a factor of two speed
up on a large list is not something to be sneezed at. But I think we're
converging on agreement: list comps and map/filter typically have
comparable performance, and as the cost of the work done increases, the
extra overhead of a function call becomes less and less significant.

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.