reduce()--what is it good for? (was: Re: reduce() anomaly?)

Mon Nov 10 05:08:33 EST 2003

Francis Avila wrote:

> "Douglas Alan" <nessus at mit.edu> wrote in message
> news:lcu15f72bq.fsf at gaffa.mit.edu...
>> Alex Martelli <aleaxit at yahoo.com> writes:
>>
>> I agree: Down with bloat!  Get rid of sum() -- it's redundant with
>> reduce(), which I use all the time, like so:

Anybody who can claim (without a smilie) that "sum is redundant with
reduce" has clearly lost all touch with reality.  "sum" accounts for
about 80-90% of the uses of reduce found on large existing codebases,
and even more when only _good_ uses of reduce are considered.  "sum"
is way faster than "reduce(operator.add, ..." -- and the advantage in
speed *AND* simplicity grows even more when one considers how often
that typical reduce is miscoded as "reduce(lambda x, y: x+y, ..." or
"reduce(int.__add__, ..." and the like.

"Add up this bunch of numbers" is a VERY frequent task in many kinds of
programs.  reduce LOOKS like a decent solution to that task, but it just
*ISN'T*.  Consider the simple, obvious, elementary solution:

[alex at lancelot test]$ timeit.py -c -s'import operator' 'x=0' 'for i in
xrange(999): x+=i'
1000 loops, best of 3: 290 usec per loop

versus progressively-fancier reduce-based ones:

[alex at lancelot test]$ timeit.py -c -s'import operator' 'reduce(operator.add,
xrange(999))'
1000 loops, best of 3: 300 usec per loop

[alex at lancelot test]$ timeit.py -c -s'import operator' 'reduce(int.__add__,
xrange(999))'
1000 loops, best of 3: 540 usec per loop

[alex at lancelot test]$ timeit.py -c -s'import operator' 'reduce(lambda x,y:
x+y, xrange(999))'
1000 loops, best of 3: 660 usec per loop

The complications of going to a higher-order function and learning about
module operator "pay off" in a SLOWDOWN compared to elementary code...!

The "fancy" ways reduce is often coded slow things down further by about
two times.  This is clearly unacceptable.  And the right solution is,
quite obviously:

[alex at lancelot test]$ timeit.py -c 'sum(xrange(999))'
10000 loops, best of 3: 129 usec per loop

giving speed, simplicity, and code that is clean, high-level, concise, and
immediately obvious to any reader or maintainer.

The paladins of reduce don't help their case by posting reduce "use
cases" that are complicated and rarely optimal:

>>      def longer(x, y):
>>         if len(y) > len(x): return y
>>         else: return x
>>
>>      def longest(seq):
>>         return reduce(longer, seq)
>>
>>      print longest(("abc", "yumyum!", "hello", "goodbye", "?"))

In 2.4, the much simpler expression:
    list.sorted(seq, key=len)[-1] 
is about 3 times faster on a typical long seq (seq=map(str, xrange(2000)).
We still don't have (in the current pre-alpha 2.4) the "one obvious way",
max(seq, key=len), because max & other functions haven't yet been upgraded
to take the optional parameter key= as lists' sort and sorted methods, but
I trust they will be.

And if one doesn't care for the concision of these new idioms, the most
elementary programming is still best:

[alex at lancelot test]$ timeit.py -c -s'xs=map(str,xrange(2000))' -s'''
> def longest(seq):
>     seq=iter(seq)
>     best = seq.next()
>     blen = len(best)
>     for item in seq:
>         ilen = len(item)
>         if ilen>blen:
>             best = item
>             blen = ilen
>     return best
> ''' 'longest(xs)'
1000 loops, best of 3: 980 usec per loop

versus:

alex at lancelot test]$ timeit.py -c -s'xs=map(str,xrange(2000))' -s'''
> def longer(x, y):
>     if len(y) > len(x): return y
>     else: return x
> def longest(seq):
>     return reduce(longer, seq)
> ''' 'longest(xs)'
100 loops, best of 3: 2.4e+03 usec per loop

Once again, reduce proves to be an "attractive nuisance": it "looks" kind
of cool, but in fact it "repays" attention and study by _slowing things
down_ even in comparison to simple, elementary programming.  Higher order
programming, such as the "max(seq, key=len)" which I hope will make it
into 2.4, is of course way simpler, faster, and more concise -- but the
"neither fish nor fowl" level at which most uses of reduce fall just
doesn't cut it, IMHO.

And btw, if one wants concision rather than speed, even in 2.3, the
typical DSU-like idiom:
def longest(seq):
    aux = [ (len(x), -i, x) for i, x in enumerate(seq) ]
    return max(aux)[-1]
still handily beats the reduce-coded alternative (even though
having the len= parameter instead of explicit decoration will
of course make it even smoother, simpler, and faster, in 2.4).

> Oh, guys...
> 
> def ireduce(func, seq):
>     seq = iter(seq)
>     try:
>         i = seq.next()
>             while True:
>                 i = func(i, seq.next())
>     except StopIteration:
>         return i

I have some trouble with this snippet's indentation (have you used
tabs...?), but, apart from that, this is surely a close equivalent
to reduce (it differs from reduce when seq is empty, raising an
UnboundLocalError rather than a TypeError, etc, but such trifles
could of course be easily adjusted).  It IS, however, marginally
slower than reduce.  I clock 'reduce(lambda x, y: y, xs)' at 1110
millisecs, but 'ireduce(lambda x, y: y, xs)' at 1270 millisecs,
when "xs=map(str, xrange(2000))".  [[ Now, of course, reduce's fans
will claim that this 15%-or-so slowdown destroys ireduce's usefulness,
while at the same time claiming that the many-times speedups versus
reduce that I have repeatedly shown for many other idioms, both lower
and higher level, don't matter -- just watch it, it'll be fun:-) ]]

> def cond(boolean, iftrue, iffalse):
>     if boolean: return iftrue
>     else: return iffalse

heh -- trying to sneak in a ternary, are we?-)

>>>> ireduce(lambda x, y: cond(len(x)==5, x,y), "All we are saying, is give
> peace a chance!".split(' '))
> 'peace'

Unfortunately, what this suggests to me is that this ireduce (not
surprisingly, as it has basically reduce's interface:-) and cond (ditto)
encourage obscure and best-avoided idioms.

For the task "get the first item in seq that has a length of 5", both
the simplest, lowest-level coding:
def firstlen5(seq):
    for item in seq:
        if len(item) == 5: return item
    else: raise ValueError, "No item has length 5"
or the concise, higher-level coding
    [ x for x in seq if len(x)==5 ] [0]
or better in 2.4
    ( x for x in seq if len(x)==5 ).next()
appear far preferable to me.  I suspect they'd also be way faster, but I'm
getting a bit tired of posting oodles of measurements that are just going
to get ignored by the frantic fans of reduce, since if they got off their
high horse and just looked at reality they'd be left with no case;-).

> Can't we all just....get along?

Sure we can -- unless you count 'reduce' itself as a part of 'us':-).

Alex