[Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")

Alex Martelli aleax@aleax.it
Sun, 20 Apr 2003 09:16:18 +0200


On Sunday 20 April 2003 01:03 am, Jack Jansen wrote:
   ...
> > For the Nth time, today somebody asked in c.l.py about how best to sum
> > a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
> > others reduce(int.__add__,L), others reduce(operator.add,L), etc, and
   ...
> > Now, I think the obvious approach would be to have a function sum,
> > callable with any non-empty homogeneous sequence (sequence of
   ...
> Do you have any idea why your sum function is, uhm, three times faster
> than the reduce(operator.add) version? Is the implementation of reduce 
> doing something silly, or are there shortcuts you can take that reduce()
> can't?

I see this has already been answered.  The only important shortcut in the
sum I coded is to delegate to ''.join if it turns out the first item is an
instance of PyBaseString_Type -- this way we get excellent performance
for "concatenate up this bunch of strings" in a way that would surely be
rather problematic for a function as general as reduce (the latter would
need to specialcase on its function argument, singling out the special
case in which it's operator.add for optimization).


> I'm asking because I think I would prefer reduce to give the speed you
> want.
> That way, we won't have people come asking for a prod() function to
> match sum(), etc.

I see I must have expressed myself badly, and I apologize.  Raw speed
in summing up many numbers is NOT the #1 motivation for my proposal.
Whether it takes 100+ microseconds, or 300+ microseconds, to sum up a
thousand integers (with O(N) scaling in either case), is not all that crucial.

I think the importance of speed here is mainly *psychological* -- an issue
of "marketing" at some level, if you will.  What I'm mostly after is to have
"one obvious way" to sum up a bunch of numbers.  I think a HOF such as
reduce would not be "the one obvious way" to most people, and having
to code an explicit loop maintaining the total has its own problems -- some
people object that it's too low-level (so not obvious enough for THEM), and it
also leads the beginner right into a performance trap when what's being 
summed is strings rather than numbers.  If the one obvious way to sum
(concatenate) a bunch of strings is ''.join(bunch), how can we say that when
the task is summing  a bunch of numbers instead, then the one obvious way
becomes a HOF or an explicit loop?

Having sum(bunch) would give the "one obvious way", and a speedup of
2 or 3 times wrt looping or using reduce would psychologically help make
it "THE obvious way", I think.

I must also have been unclear on why I think sum is important in a way
that prod and other reduce operations aren't: summing a bunch of numbers
is *quite a common task* in many kinds of everyday programming (and in
fact everyday life) in a way that multiplying a bunch of numbers (much less
any other such bulk operation) just isn't.  "prod" isn't even an English word
(well it IS, but not in the meaning of "product":-) and when people talk
about "the product" they're hardly ever talking about multiplication, while
"the sum" or also commonly "the total" are words that do indicate the
result of repeatedly applying addition (even when you say "the sum" to
indicate an amount of money, the history of that word does come from
addition -- addition of the values coins and notes of varying denominations
making up "the sum" -- while "the product" as normally used in English
has nothing to do with multiplying).

I think I understand the worry that introducing 'sum' would be the start
of a slippery slope leading to requests for 'prod' (I can't think of other
bulk operations that would be at all popular -- perhaps bulk and/or, but
I think that's stretching it).  But I think it's a misplaced worry in this 
case.  "Adding up a bunch of numbers" is just SO much more common
than "Multiplying them up" (indeed the latter's hardly idiomatic English,
while "adding up" sure is), that I believe normal users (as opposed to
advanced programmers with a keenness on generalization) wouldn't
have any problem at all with 'sum' being there and 'prod' missing...


Alex