reduce()--what is it good for? (was: Re: reduce() anomaly?)

Douglas Alan nessus at mit.edu
Mon Nov 10 18:50:48 EST 2003


Alex Martelli <aleax at aleax.it> writes:

> Anybody who can claim (without a smilie) that "sum is redundant with
> reduce" has clearly lost all touch with reality.

*All* touch?  My girlfriend still feels real to me when I touch
her....

> "sum" accounts for about 80-90% of the uses of reduce found on large
> existing codebases,

I have *never* used sum() (or a reduce() that ammounts to sum()).  And
I use reduce() fairly often, in more or less the way that I described.

> and even more when only _good_ uses of reduce are considered.  "sum"
> is way faster than "reduce(operator.add, ..." -- and the advantage
> in speed *AND* simplicity

I don't care about speed all that much when using Python.  If I did, I
probably wouldn't be using Python.  And to me, "simple" means
providing a small number of orthogonal features that combine easily,
rather than a lot of special purpose features, that all have to be
learned, remembered, and documented separately.  sum() is a
special-purpose feature, while reduce() is a very general feature that
is very easily adapted to a myriad of situations.  So what if 80-90%
of the uses of reduce in your experience is to support sum()?  That's
not my experience, and even if it were, if sum() is used a lot, then
10-20% of that is still a lot.  max(), like sum() also has a built-in
reduce.  So the current approach for the future of Python seems to be
to build reduce() into everything that might need it?  I'd prefer to
see reduce() taken out of sum() and out of max() and then no one needs
to document or learn that sum() and max() duplicate the functionality
of reduce().  Just learn, remember, and document one thing, and you're
all set.

> grows even more when one considers how
> often that typical reduce is miscoded as "reduce(lambda x, y: x+y,
> ..." or "reduce(int.__add__, ..." and the like.

People who do this, no doubt, are more interested in coding than
wading through a big manual to figure out to make use of Python's
idiosyncracities to get their code to run as fast as possible in this
one particular implementation of Python.

Such people are surely not going to want to wade through the manual to
find sum().

> The complications of going to a higher-order function and learning about
> module operator "pay off" in a SLOWDOWN compared to elementary code...!

All of 3.4% according to your calculations.  I'm not losing any sleep
over it.

Learning about module operator is not a chore, since I'd want to know
how to access these operators anyway.

Higher order functions are not complicated -- they are beautiful and
elegant.

> The "fancy" ways reduce is often coded slow things down further by about
> two times.  This is clearly unacceptable.

There's nothing fancy about reduce() -- it's taught in every CS 101
class!

> And the right solution is, quite obviously:

> [alex at lancelot test]$ timeit.py -c 'sum(xrange(999))'
> 10000 loops, best of 3: 129 usec per loop

Why?  Because it saves having to type "operator.add,"?  Because it's a
bit faster?  Who cares about either one of these things.  Certainly
not I.

> giving speed, simplicity, and code that is clean, high-level, concise, and
> immediately obvious to any reader or maintainer.

reduce() is immediately obvious to anyone who has ever taken CS
101. sum() is going in the direction of Perl, where there's a separate
idiomatic feature for everything that anyone might ever want to do.

I like Python because it is a very simple, generic, easy-to-learn
programming language, where I didn't have to learn much of anything
anything new.  It was just like all the other myriad of programming
languages I have programmed in, only less cluttered than most, and
"light-weight" in its implementation and accessibility.

I having nothing against learning new stuff, by the way, but when I
want to learn new stuff, I'm not particularly interested in the
idiosyncrasies of Python -- I'd spend my effort learning something a
bit more exotic, like OCaml, or Haskell, that might stretch my brain a
bit.  Learning a bunch of idiosyncratic detail is just tedious.  This
is one of the most important reasons that Perl sucks so badly.

> The paladins of reduce don't help their case by posting reduce "use
> cases" that are complicated and rarely optimal:

>>>      def longer(x, y):
>>>         if len(y) > len(x): return y
>>>         else: return x

>>>      def longest(seq):
>>>         return reduce(longer, seq)

>>>      print longest(("abc", "yumyum!", "hello", "goodbye", "?"))

There's nothing complicated about my example.  It's the epitome of
clarity and elegance.

> In 2.4, the much simpler expression:
>     list.sorted(seq, key=len)[-1] 
> is about 3 times faster on a typical long seq (seq=map(str, xrange(2000)).

There's nothing simpler about that (in fact, conceptually, it's much
more complex) and it's only faster because of Python implementation
details.  In most languages, or an alternate implementation of Python,
sorting a list would typically be significantly slower than a linear
search on it.

> We still don't have (in the current pre-alpha 2.4) the "one obvious way",
> max(seq, key=len), because max & other functions haven't yet been upgraded
> to take the optional parameter key= as lists' sort and sorted methods, but
> I trust they will be.

More annoying bloat.  Instead of pumping up max() with bloat, tell
people to use reduce().  It's already there and simple and elegant.

> And if one doesn't care for the concision of these new idioms, the most
> elementary programming is still best:

> [alex at lancelot test]$ timeit.py -c -s'xs=map(str,xrange(2000))' -s'''
>> def longest(seq):
>>     seq=iter(seq)
>>     best = seq.next()
>>     blen = len(best)
>>     for item in seq:
>>         ilen = len(item)
>>         if ilen>blen:
>>             best = item
>>             blen = ilen
>>     return best
>> ''' 'longest(xs)'
> 1000 loops, best of 3: 980 usec per loop

That's supposed to be best?  It would take me 100 times as long to
figure out that code.

> Once again, reduce proves to be an "attractive nuisance": 

With the emphasis on "attractive".  It results in easy-to-read, easy
to remember coding techniques.

> Higher order programming, such as the "max(seq, key=len)" which I
> hope will make it into 2.4, is of course way simpler, faster, and
> more concise -- but the "neither fish nor fowl" level at which most
> uses of reduce fall just doesn't cut it, IMHO.

Except for that I'd never use it because I'd never even know it was
there because I prefer to spend my time writing software than wading
through manuals.

> And btw, if one wants concision rather than speed, even in 2.3, the
> typical DSU-like idiom:
> def longest(seq):
>     aux = [ (len(x), -i, x) for i, x in enumerate(seq) ]
>     return max(aux)[-1]
> still handily beats the reduce-coded alternative (even though
> having the len= parameter instead of explicit decoration will
> of course make it even smoother, simpler, and faster, in 2.4).

The above is particularly ugly.

|>oug




More information about the Python-list mailing list