[Python-ideas] map, filter, reduce methods for generators

Andrew Barnert abarnert at yahoo.com
Fri Apr 11 05:02:09 CEST 2014


From: Jacek Pliszka <jacek.pliszka at gmail.com>
Sent: Thursday, April 10, 2014 2:12 PM


>What do you think about adding map, flatmap, filter and reduce methods to generator type ?

That wouldn't help your intended use case, because range is not a generator. In fact, most iterables are not generators. Non-iterators like list and dict, iterators defined as classes, iterators returned by builtins and C extension modules, etc. are not generators either. So, do you want to somehow add this to all possible iterable types? Or do you want to force people to wrap an iterable inside an unnecessary generator (x for x in spam) just so they can call these methods on the wrapper? Or… ?

And this isn't just a side issue; this gets to the heart of the difference between Python and Java. Java requires everything to be hammered into its OO paradigm. These are methods in Java because everything has to be a method in Java. In C++, Haskell, OCaml, or just about anything besides Java (and its cousins, like Ruby and various .NET languages), they're generic or polymorphic or duck-typed free functions that are defined once and work on any type that makes sense, instead of methods that have to be defined on every possible type that might have a use for them.

>I must admit I've seen and I like Java 8 notation and I think it might be more readable than Python way in a few occasions.

>
>I would like to be able to write:
>
>range(100).\
>  filter( f1 ).\
>  map( f2 ).\
>  filter( f3 ).\
>  map( f4 ).\
>  reduce(operator.add)
>
>in addition to current Pythonic way of
>
>reduce( operator.add, f4(x) for x in 
>  ( f2(y) for y in range(100) if f1(y))
> if f3(x) )


This is not at all the Pythonic way to write it. And the fact that you think it is implies that maybe you're trying to solve a problem that doesn't exist.

First, you're using reduce(add) instead of sum. I think this creates a false problem—if you think in terms of "map, filter, reduce", then there's no way to get rid of some of the function calls piling up on the left. But if you really think about it, map and filter are different from reduce: they transform an iterable into an iterable, so you can call them any number of times in your sequence of transformations, but reduce transforms an iterable into a single value, so you only call it once. Which means there aren't function calls piling up on the left, there's exactly one function call on the left.

Also, you're trying to cram everything into one expression for no good reason, which forces you to come up with some idiosyncratic way to wrap it to 80 columns. In Java, creating unnecessary temporary variables is often considered an anti-pattern, probably because of its C heritage (where it can be a performance issue). In Python, this is instead a very common idiom.


Let's start with the simplest possible way to write this:


    r = range(100)
    r = filter(f1, r)
    r = map(f2, r)
    r = filter(f3, r)
    r = map(f4, r)

Now, taking advantage of comprehensions:

    r = range(100)
    r = (f2(x) for x in r if f1(x))
    r = (f4(x) for x in r if f3(x))

This may look like the syntax is hiding the real functionality, but that's only because the real functionality is invisible in your example, because you've named the functions f1, f2, f3, and f4. Try an example with realistic function names and it will look a lot different. And of course half the time, you don't actually have a function lying around, you just want to map or filter with some expression, in which case the sequence of comprehensions wins even bigger. Compare:

    found_squares = (x**2 for x in range(100) if x in found)
    weighted_sum = sum(x / dups[x] for x in r if x in found_dups)

    weighted_sum = range(100). \
        filter(lambda x: x in found). \ # or found.__contains__ if you insist
        map(lambda x: x**2). \
        filter(lambda x: x in found_dups). \
        map(lambda x: x / dups[x]).
        reduce(operator.add)

You really think the second one is more readable?

Notice that in the first one, everything is happening in the order of the data flow, you're not piling up function calls on the left, etc.; all the advantages you're looking for. If you haven't read David Beazley's "Generator Tricks for System Programmers", google it for some great realistic examples (and some great background discussion, too).



More information about the Python-ideas mailing list