[Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")

Alex Martelli aleax@aleax.it
Sat, 19 Apr 2003 23:43:48 +0200


Sorry to distract python-dev's august collective attention from its usual
exhalted concerns down to a mundane issue;-), but... we may be able to
strike a tiny blow for simplicity, clarity, power, AND performance at once...

For the Nth time, today somebody asked in c.l.py about how best to sum
a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
others reduce(int.__add__,L), others reduce(operator.add,L), etc, and some
(me included) a simple
    total = 0
    for x in L:
        total = total + x

The usual performance measurements were unchained (easier than ever
thanks to timeit.py of course;-), and the paladins of reduce were once again
dismayed by the fact that the best reduce can do (that best is obtained with
operator.add) is mediocre (e.g. on my box with L=range(999), reduce takes 
330 usec, and the simple for loop takes 247).

Discussion proceeded on whether "reduce(operator.add, L)" was abstruse
for most people, or not, and on whether the loop was or wasn't "too low
level", as the Pythonic approach to such a common task.

It then struck me that Python doesn't HAVE "one single obvious way" to
do what IS after all a rather common task in everyday programming,
namely, "sum up this bunch of things" (typically numbers, occasionally
strings -- and when they're strings the "obvious" loop above is terribly
slow, a typical newbie trap...).  Somebody proposed having operator.add 
take any number of arguments -- not quite satisfactory, AND dog-slow, 
it turned out to be (when I tried a quick experimental mod to operator.c),
due to the need to turn a sequence (typically a list) into a tuple with *.

Now, I think the obvious approach would be to have a function sum,
callable with any non-empty homogeneous sequence (sequence of
items such that + can apply between them), returning the sequence's
summation -- now THAT might help for simplicity, clarity AND power.

So, I tried coding that up -- just 40 lines of C... it runs twice as fast 
as the plain loop, for summing up range(999), and just as fast as ''.join 
for summing up map(str, range(999)) [for the simple reason that I 
special-case this -- when the first item is a PyBaseString_Type, I 
delegate to ''.join].

Discussing this with newbie-to-moderately experienced Pythonistas,
the uniform reaction was on the order of "you mean Python doesn't
HAVE a sum function already?!" -- most everybody seemed to feel
that such a function WOULD be "the obvious way to do it" and that
it should definitely be there.

So -- by this time I'm biased, having invested a bit of time in this --
what do y'all think... any interest in this?  Should I submit it?  I'm not
quite sure where it should go -- a builtin seems most natural (to keep
company with min and max, for example), but maybe that would be
too ambitious, and it should be in math or operator instead...


Alex