[Python-ideas] Map-then-filter in comprehensions

Sjoerd Job Postmus sjoerdjob at sjec.nl
Tue Mar 8 17:21:31 EST 2016


On Tue, Mar 08, 2016 at 02:17:23PM +0000, Allan Clark wrote:
> tl;dr What is support like for adding an 'as' clause to comprehension
> syntax? In order to allow map-then-filter, it might look like something
> this:
> 
>     [y for x in numbers if abs(x) as y > 5]
> 
> I wish to propose an extension to Python comprehension syntax in an attempt
> to make it applicable in more areas. I'll first describe the deficiency I
> perceive in the current comprehension syntax and then propose my extension.
> I'll then note some drawbacks. For the purposes of illustration I'll use
> list comprehension syntax but I believe everything I say is equally
> applicable to set and dictionary comprehensions. I'll also talk about lists
> but again (more or less) everything applies to the more general concept of
> iterables.
> 
> Comprehensions are essentially a filter followed by a map operation. So if
> you need to perform a filter operation followed by a map operation over a
> list, this is pretty convenient in Python. Here we are going to take the
> absolute value of all even numbers within the range -10 to 10.
> 
>     numbers = range(-10, 10)
>     [abs(x) for x in numbers if x % 2 == 0]
> 
> However, if you wish to perform a map operation and *then* a filter
> operation this is not so convenient, so suppose we wish to obtain the
> absolute value of all numbers that have an absolute value larger than 5, we
> can do this by calling the mapped method twice:
> 
>     abs(x) for x in numbers if abs(x) > 5]
> 
> This is a bit unsatisfying and even impossible in the case that the mapped
> method has some side-effect (although arguably if you find yourself in that
> situation you have taken a mis-step somewhere). An alternative is to apply
> the mapping first:
> 
>     [y for y in (abs(x) for x in numbers) if y > 5]
> 
> I have to say I quite like this, as it is pretty explicit, but it is a bit
> unsatisfying that you require only one comprehension for a filter-then-map
> but two for a map-then-filter. What would be nice is if we could give a
> name to the mapped expression and then use that in the filter.
> 
>     [abs(x) as y for x in numbers if y > 5]
> 
> I don't like this as it means the order of execution is dependent on
> whether there is an 'as' clause, in particular the 'if' clause itself may
> do some computation such as in `if f(y) > 5`.
> 
> An altenative is to allow 'as' expressions in the condition, something
> like:
> 
>     [y for x in numbers if abs(x) as y > 5]
> 
> Note that it would be possible to map to an intermediate result, such as:
> 
>     [y**2 for x in numbers if abs(x) as y > 5]
> 
> Alternatively we could put the 'as' in the pattern:
> 
>     [y**2 for abs(x) as y in numbers if y > 5]
> 
> I did not like this as it is obscures the fact that 'x' is being set to
> each element of 'numbers'. Additionally, we might later wish to adopt a
> functional programming idiom in which we use 'as' for deconstructive
> assignment whilst giving a name to the entire matched value, for example:
> 
>     [p for (x,y) as p if x > y]
> 
> Or more generally:
> 
>     (x,y) as a = f(z)
> 
> But that is getting somewhat off-topic. I promised some drawbacks:
>     * I am confident there are some implementation gotchas nestling in here
> somewhere.
>     * I could imagine how abuse of such a mechanism to lead to pretty
> unreadable code.
>     * I'm still not *that* upset by the explicit map first: `[y for y in
> (abs(x) for x in numbers) if y > 5]`
>     * I could see how it is not immediately obvious what the code does.
>     * It would need to be decided whether you allowed multiple 'as'
> expression in the condition, particularly using 'and' or 'or' as in 'if
> f(a) as x > 5 and f(b) as y > 5'
> 
> To summarise:
>     * It's a touch annoying that comprehensions allow filter-then-map but
> not map-then-filter
>     * Three proposed syntaxes are:
>          * [abs(x) as y for x in numbers if y > 5]
>          * [y for x in numbers if abs(x) as y > 5]
>          * [y**2 for abs(x) as y in numbers if y > 5]
>     * My favourite is the middle one.
> 
> Finally these seem to currently be syntax errors so we should not break any
> existing code.

Seeing the many replies, I'm a bit lost as where to best comment. After
thinking about it for a while, I think that currently it's not
impossible to do what you want with comprehensions, just a bit
convoluted.

    [y for x in numbers for y in [abs(x)] if y > 5]

The tricky part being the `for y in [abs(x)]` basically doing what you
want: bind the value `abs(x)` to a name (`y`).

Benefits:
- It requires no new syntax, no new semantics.
- You can easily define multiple items at once.
      [y + z for x in numbers for y, z in [(abs(x), sgn(x)] if y > 5]
- You can still use the original value, in contrast to
      [y for y in (abs(x) for x in numbers) if y > 5]

Downside:
- Very unreadable, and probably a long way off from being idiomatic
  Python.

As for yet another syntax suggestion (if we want to introduce
something).

    [y for x in numbers with abs(x) as y if y > 5]

The benefit is that it reads quite natural: "with expr as name".
Another benefit is that keywords get reused.
However, that already has semantics outside a comprehension for
context-managers.



TL;DR: It's possible with
    [y for x in numbers for y in [abs(x)] if y > 5]
But the syntax is ugly enough that it does warrant some extra syntactic
sugar (or something with the same semantics but better performance
characteristics).


More information about the Python-ideas mailing list