[Python-ideas] Intermediate Summary: Fast sum() for non-numbers

Sergey sergemp at mail.ru
Tue Jul 16 05:58:13 CEST 2013


On Jul 15, 2013 Ron Adam wrote:

> You could copy the code from sum() and make a function with a different 
> name that specialises in non-number addition.  That would not have any 
> backwards compatibility issues.

That would just add one more workaround, but would not solve the
problem. Sum is already O(N*N) for many containers, and adding
more workarounds would not make it any faster.

As for me using "+" e.g. to add strings looks rather obvious. Adding
lists looks similar to adding strings. And sum() looks like a lot of
"+"es. I mean I see nothing strange in using sum for list of lists.
It as natural as using max() to find the "largest" word in a list of
strings — a nice and expected feature. But *if* in some distant
python version we would have separate operations for numerical
addition and sequence concatenation, then we might split our current
sum() into two functions: sum() and concat().

But to do that we must first solve the problem for sum().
Or we'll have exactly same problem with concat() anyway.

> Do you think you could use what you learned with sum to make chain, or a 
> new fold function faster?

I'm not sure that chain or reduce/fold could benefit from them, since
all suggestions are about containers and "+" operator, except, maybe,
suggestion #2, since everybody are expected to benefit from it.

> The advantages are..
> 
> Chain works with most iterables and uses much less memory in some cases.

That's the main difference: sum returns a container, which is
complete and ready to use, while chain returns some weird type. :)
It does not use a container, so you can't have any container-specific
optimization in it. I.e. imagine:
  x = [[1,2], [3,4], 5, 6]
now try:
  for i in sum(x, []): print(i)
and:
  for i in chain.from_iterable(x): print(i)
in first case you'll get an error instantly, while in second case
you'll have print called several times before you get an error.

> A new fold function would do quite a lot more depending on the operator 
> passed to it.  It may be possible to speed up some common cases that use 
> methods on builtin types.

I'm not sure what would be a common case for it.
It looks like a renamed reduce. What's a common case for reduce?


More information about the Python-ideas mailing list