[Python-ideas] Fast sum() for non-numbers

Andrew Barnert abarnert at yahoo.com
Tue Jul 9 10:51:41 CEST 2013


From: Stephen J. Turnbull <stephen at xemacs.org>

Sent: Monday, July 8, 2013 11:23 PM


> Andrew Barnert writes:
> 
>>  Meanwhile sum is the obvious way to sum things that are obviously
>>  summable (numbers, matrices, etc.), and nothing else.
> 
> My intuition matches yours, but I find this argument (and the rest of
> the arguments that say that "generic sum() is unobvious and wrong")
> logically unsatisfactory.  It would be nice if you could provide a
> plausible definition of "summable" other than "__add__() is
> implemented".  I don't have one. :-(


As I see it, there are three possibilities.

1. sum is not appropriate when __add__ means concatenation rather than adding. If you'd use PySequence_Concat/sq_concat rather than PyNumber_Add/nb_add in porting to the C API, or if you'd use a different operator in Python 4 if concatenation stopped being written as __add__, then you shouldn't use sum. The problem is usually, you're not writing C API code or Python 4, you're writing Python 3, so it's not always obvious what the facts are. But I don't think it's ever that hard to figure out. If we used & for concatenation, list.__add__ and str.__add__ would no longer exist, but np.matrix.__add__, datetime.__add__, and quaternion.__add__ would.

2. sum is not appropriate iff chain.from_iterable makes sense.* Needing a list doesn't make chain unusable here any more than it does with map, zip, etc.; just pass the result to list. But "You can't iterate ints" or "I'm treating these np.matrix 3-vectors as atomic objects, not collections" does mean chain is unusable, so sum is the answer.

3. sum is not appropriate when 0 doesn't make sense as a start value. Summing things means, by default, starting with 0 and adding repeatedly. You can provide a non-default start value, but it should be "similar to 0" or "compatible with 0" in some way. Note that you can add 0 to an int, a float, a quaternion, a numpy.matrix, and all kinds of other types that "act like numbers". And that means that testing 0+start or start+0 is a pretty good test for summable. This is admittedly imperfect,** but it's pretty close, and very concrete and simple.

Personally, I'm leaning toward 2. If you come up with a type that is addable, and isn't iterable (or does the wrong thing when iterated), why not, it's summable. Better to let some corner-case false positives slip in than the reject some corner-case false negatives (as with 3), or to make them just impossible to decide (as with 1).


* This is a little too loose. Surely there are types where __add__ is not addition, but also not iterable concatenation, right? But I don't think sum is an attractive nuisance there, unlike in the case with concatenation. Meanwhile, what about strings? Actually, chain makes perfect sense with strings; it's just that usually, you're just going to want to pass the iterable to ''.join, and if you're doing that, you can just pass the original strings to ''.join. So, no problem there.

** Most notably, I think adding a sequence of timedelta objects with a datetime start makes sense, and you can't add 0 to a datetime. Really, what you need here is a way to say that start + 0 * peek(iterable) is also acceptable, not just start + 0—and you can justify that more rigorously in terms of fields—but that's nearly impossible to implement, and way too complicated to explain. So, option 3 will reject some valid types.


More information about the Python-ideas mailing list