Pre-pep discussion material: in-place equivalents to map and filter

Fri Nov 4 21:16:16 EDT 2016

On Sat, Nov 5, 2016 at 11:42 AM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote:
>
> [...]
>> List comps themselves involve one function call (zero in Py2). What
>> you do inside the expression is your business. Do you agree that list
>> comps don't have the overhead of opening and closing files?
>
> /tongue firmly in cheek
>
> I'd like to see you run a Python script containing a list comprehension
> without opening and closing the .py file.
>
> :-P

You got me! Let's call it a night.
-- Kristoff (nearly my namesake)

> Okay, I see what you are getting at now: in CPython 3, list comprehensions
> are implemented in such a way that the list comprehension requires a
> minimum of one function call, while list(map(func, iterable))) requires a
> minimum of O(N)+2 function calls: a call to list, a call to map, and a call
> to func for each of N values.
>
> That's *technically* true, but it is an implementation detail. I'm sure that
> some day PyPy or Nuitka or maybe even CPython itself will start in-lining
> at least some functions (if PyPy doesn't already do so). That would be an
> obvious optimization to apply to map() and filter().

Mmmm, interesting. The fact still remains that map depends on some
kind of "object representation" of a block of code (since it's being
passed to some other function), where a comprehension can be
implemented as an actual expression. So either Python-the-language
needs a way to pass around lightweight blocks of code, or the
interpreter (for some instance of 'the interpreter') needs to
recognize the function calls and optimize them away, or list comps
will always have an inherent advantage over map.

> As far as the speed of map() versus list comps, my micro-benchmarks show
> that at least on my computer, map() can be marginally but consistently
> faster than a list comp once you equalise that cost of function calls, that
> is, if the list comp explicitly calls a function. I don't think that speed
> difference is significant, so let's just call them "equally fast" when
> comparing similar cases:
>
>     [func(obj) for obj in iterable]
>
>     map(func, iterable)

Right, which is why a lot of style guides recommend against the first
form, *in this specific instance*. Using map with a lambda function,
or a comprehension with nothing but a function call, is rightly called
out in code review.

> I didn't read the OP as making a specific claim about these two *specific*
> map and filter examples:
>
>     lst = map (lambda x: x*5, lst)
>     lst = filter (lambda x: x%3 == 1, lst)
>
> I read these as mere examples of a general claim that map and
> filter "perform especially bad in CPython compared to a comprehension".
> ... I don't think that justifies the claim of "especially
> bad", which to me implies something much worse. If you're going to describe
> a mere factor of two as "especially bad", what words do we have left for
> something that is ten thousand times slower?
>
> As the wisest man in the universe once said, hyperbole is the most terrible,
> awful crime against humanity.
>
> *wink*

Ah, now we get to the point where we disagreed. I was responding to a
misunderstanding of your position - I thought you disagreed that the
performance difference could even be significant, but you were arguing
against the "especially bad". Gotcha. In that case, I believe we're in
agreement; even a two-to-one difference isn't "especially bad" here,
and that would be an extreme case.

>> But this conclusion I agree with. There is a performance difference,
>> but it is not overly significant. Compared to the *actual work*
>> involved in the task (going through one list and doing some operation
>> on each operation), the difference between map and a comprehension is
>> generally going to be negligible.
>
> I wouldn't go quite so far as to say "negligible" -- a factor of two speed
> up on a large list is not something to be sneezed at. But I think we're
> converging on agreement: list comps and map/filter typically have
> comparable performance, and as the cost of the work done increases, the
> extra overhead of a function call becomes less and less significant.

I said negligible because the factor of two disappears almost
completely when you add a large constant factor to it. What's the
performance of these?

def lie(n):
    """it's not a fib, honest"""
    if n < 2: return 3
    return lie(n-1) + lie(n-2)

mapped = list(map(lie, range(50)))
comprehended = [lie(x) for x in range(50)]
badly_mapped = list(map(lambda x: lie(x), range(50)))

By any reasonable metric, I would expect all three to have extremely
comparable performance. The difference between map's best case
(passing an existing function), a list comp, and map's worst case
(wrapping the function in a useless lambda function) might be two to
one, but it's negligible compared to the surpassing cost of those
massively-nested lies. Oh, what a tangled web we weave...

ChrisA