reduce()--what is it good for? (was: Re: reduce() anomaly?)

Alex Martelli aleax at aleax.it
Fri Nov 7 05:08:24 EST 2003


Bob Gailer wrote:
   ...
> One's prior programming experience can affect one's view of reduce. My
> favorite language, prior to Python, was APL. APL's native data container
> is the array, and reduce is a native operator in APL. So we used reduce a

I've done a _lot_ of APL and APL2 (albeit well before I met Python --
basically in the 1978-1988 time frame) and I don't think that "affects
my view of reduce" _in Python_ -- any more than (e.g.) having done a
lot of Scheme would affect my view of lambda and tail recursion _in
Python_, a lot of Icon that of generators, a lot of Perl that of RE's, ...

Basically, the old quip about "coding Fortran in any language" does not
apply to Fortran _only_: you can, up to a point, code (APL, Icon, Scheme,
Perl, ...) in any language... but it's _still_ basically a bad idea.

Mastering a language means (in part) mastering the idioms thereof --
what works well/smoothly/elegantly in that language, what most of that
language's best practitioners _do_.  Knowledge of other languages --
the more the merrier -- can help, giving you perspective, but in the
end it shouldn't really affect your view of the target language.

The reduce method of Numeric's ufuncs _is_ a good, close analogy
for APL's reduce (more generally, APL concepts -- though not syntax --
do help a lot with Numeric, far more than they do with base Python).
Numeric.add.reduce(xs) IS a perfect equivalent to APL's +/xs -- if
xs is a Numeric.array (else the conversion overhead's gonna kill you:-):

$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'reduce(operator.add, xs)'
100 loops, best of 3: 2.9e+03 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'Numeric.add.reduce(xs)'
10 loops, best of 3: 2.2e+04 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
'sum(xs)'
1000 loops, best of 3: 1.15e+03 usec per loop
$ timeit.py -c -s'import Numeric' -s'import operator' -s'xs=range(9999)'
-s'xa=Numeric.array(xs)' 'Numeric.add.reduce(xa)'
10000 loops, best of 3: 54 usec per loop

Yeah, I know, speed isn't everything.  But when the speed of a perfectly
analogous operation drops by two times (from reduce to sum) or even
more when it drops by 50 times (from reduce on a list to Numeric.add.
reduce on a Numeric.array) one _does_ get a sense of when one's using
a language "reasonably idiomatically" -- within the design space of that
language, within the pathways most often taken by the language's best
practitioners and therefore most likely to be solid and optimized.


> companion of reduce in APL is scan, which did reduce but preserved all the
> intermediate values (cumulative sum for example).

Yep, that's e.g. Numeric.add.accumulate in Numeric + Python.


> To expand your reduce horizons in Python, consider reduce as a way to
> examine the relationship between successive pairs of a sequence. For
> example one might wants the difference between successive elements: given
> seq = [1,3,4,7] we'd like a differences == [2,1,3].
> 
> differences = []
> def neighborDifference(left, right):
>    differences.append(right - left)
>    return right
> reduce(neighborDifference, seq)

def difs_reduce(seq):
    differences = []
    def neighborDifference(left, right):
       differences.append(right - left)
       return right
    reduce(neighborDifference, seq)
    return differences

def difs_zip(seq):
    return [ b-a for a, b in zip(seq,seq[1:]) ]

import itertools
def difs_izip(seq):
    return [ b-a for a, b in itertools.izip(seq,seq[1:]) ]

def difs_loop(seq):
    differences = []
    it = iter(seq)
    a = it.next()
    for b in it:
        differences.append(b-a)
        a = b
    return differences

$ timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_reduce(x)'
100 loops, best of 3: 1.8e+04 usec per loop
$ timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_zip(x)'
100 loops, best of 3: 1.8e+04 usec per loop

so, the list comprehension with zip has just the same performance
as the clever use of reduce -- and is far more concise.  But if
you're looking for speed, then the humble loop...:

timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_loop(x)'
100 loops, best of 3: 9e+03 usec per loop

...is twice as fast!  Simplicity isn't to be sneered at.  And
if you're keep for higher-level reasoning and expression, itertools
is quite good for that:

timeit.py -c -s'import a' -s'x=range(9999)' 'a.difs_izip(x)'
100 loops, best of 3: 7.6e+03 usec per loop

a one-liner, and another 15% speed boost wrt the humble loop.
[I can shave another 10% performance improvement with an
experimental 'w2' type I've coded to do specifically the
window-by-two job, but that small level of acceleration is
probably not enough to justify an extra type...;-)]


Yeah, I know, sigh, I come across as a micro-optimization freak,
which most definitely _ISN'T_ the point.  I'm using speed and
concision as proxies for *idiomatic use* -- simplicity and
elegance and maintainability... "elegance" &c we could argue about
forever, so I go for numbers that aren't quite as arguable:-).


I think reduce (like apply, filter, and probably map too) has had
its day but really doesn't have enough good use cases any more, in 
this day and age of list comprehensions and iterator-tools, to 
justify its existence as a Python built-in except for legacy reasons.


Alex





More information about the Python-list mailing list