[Python-ideas] All ideas together: Fast sum() for non-numbers

Sergey sergemp at mail.ru
Fri Jul 5 09:41:03 CEST 2013


On Jul 4, 2013 Joshua Landau wrote:

> One thing I didn't understand from the original post was that this
> was a "special case".

That's because it wasn't.

There were too many different ideas during this thread, I'll try to
collect them all together. So...

Sum() is a great function. Why? Because it makes the code simple!
You can always live without sum(), you can use itertools, reduce,
lambdas, etc. But using sum() always makes the code easier to read.

Simple Code is the goal! Faster sum() is just a tool to reach it!

Sum() was designed to support different types from the beginning.
For example, Tim Peters suggested to use it for the list of
datetime.timedelta [1]. Unfortunately for some common types sum()
is too slow. This encourages people to fallback to other solutions,
write a code that is hard to read and maintain.

That's why I suggest making sum a little faster. Still not the
fastest thing ever, but fast enough to have a choice between
"a little faster" and "easier to read".

Faster sum() at least for something (e.g. lists) is already a good
thing, but even the great "fast sum() for everything" goal is easy
to reach.

I already wrote some patches. So now there 4 suggestions/patches
pending:

1. Fast sum() for lists and tuples. [2]
   This patch adds a special case optimisation. I wasn't going to do
   it that way, but sum() already have 2 special cases for ints and
   floats (yes, sum() already has special cases, and it has them for
   the last 6 years), one more special case to reach the "fast for
   everything" goal is not that bad. Practicality beats purity.
   This patch introduces no behavior changes, and may go even for
   python2 bugfix release.

2. Fast sum() for strings. [3]
   This patch is a small enhancement of #1, that makes sum() O(N) for
   strings. Obviously it introduces a behavior change and I do NOT
   suggest it for inclusion. I believe that ''.join() alternative is
   simple enough and don't have to be replaces with sum() for normal
   use cases. It is just to shows that sum() can be O(N) for strings.

3. Fast sum() for everything else. [4]
   This is the original patch that I suggested. It rearranges existing
   code so that sum() uses __iadd__ if available, making sum() fast
   for all built-in and custom types with fast __iadd__.
   Even with #1 this patch also introduces no behavior changes, and
   may go even for python2 bugfix release.

4. Explicit documentation of sum() complexity.
   If patches #1 or #3 are accepted it may be a good idea to
   explicitly state what sum() needs to be O(N) for user types.
   E.g.:
   """
   function:: sum(iterable[, start])

   Sums *start* and the items of an *iterable* from left to right and
   returns the total. *start* defaults to ``0``.

   To make advantage of faster __iadd__ implementation sum() uses it
   if possible. sum() has linear time complexity for built-in types
   and all types providing constant-time `__iadd__` or `__add__`.

   For some use cases, there are good alternatives to sum(). The
   preferred, fast way to concatenate a sequence of string is by
   [...]
   """

Patches #1 and #2 were written as an answer to "you can't make sum()
fast for everything, e.g. strings and tuples". Yes, I can. :)

Together these patches achieve the "fast sum() for everything" goal
and make sure, that people know what is required to keep sum() fast.

That's all the suggestions I have for now.
Well, there's one more, but let's first decide about these. :)

-- 

[1] http://mail.python.org/pipermail/python-dev/2003-April/034837.html

[2] http://bugs.python.org/file30769/fastsum-special.patch
    without the string part

[3] the string part of
    http://bugs.python.org/file30769/fastsum-special.patch

[4] http://bugs.python.org/file30705/fastsum.patch


More information about the Python-ideas mailing list