[Python-ideas] Fast sum() for non-numbers

Steven D'Aprano steve at pearwood.info
Thu Jul 4 05:35:15 CEST 2013


On 04/07/13 08:02, Joshua Landau wrote:
> On 3 July 2013 18:58, Steven D'Aprano <steve at pearwood.info> wrote:
>> On 04/07/13 02:58, Joshua Landau wrote:
>>
>>> What's wrong with sum, if sum is fast?
>>
>> sum simply cannot *always* be fast. E.g. summing tuples will still be slow
>> even with your suggestion.
>>
>> Using sum() on anything other than numbers is conceptually dubious; yes,
>> sum() is intended for addition, but it's intended for *numeric* addition.
>
> Sum: A quantity obtained by addition or aggregation.
> [http://en.wiktionary.org/wiki/sum]
>
> I don't see why it should be constrained.

I didn't say is should be constrained. But there is a big difference between "don't prevent people calling sum() on lists, if they insist" and "encourage people to use sum() on lists, in preference to list.extend".


>> It's unfortunate that Python uses + for concatenation,
>
> Is it? I'm very happy with it myself.

Yes. If Python used & for concatenation, we wouldn't have to worry about sum(lists or strings or tuples) being quadratic, because people wouldn't call sum on lists, strings or tuples.


>> we're stuck with it
>> until Python 4000, but if you're using sum() to add lists, tuples, or other
>> non-numbers, you're living in a state of sin.
>
> Why? There are a lot of things it makes sense to take the sum of - a
> lot of which have constant-ish performance on addition.

A "lot" of things?

There are numbers (int, float, Decimal, Fraction, complex), and numpy arrays of numbers. And you probably shouldn't call sum on floats if you care about precision. (Use math.fsum instead.)

Strings? Quadratic.
Tuples? Quadratic.
Lists? Currently quadratic, could be optimized, if we care.

I can't think of anything else that supports + and has near-constant time repeated addition. What are you thinking of?


>> (A venal sin rather than a
>> mortal sin.) It's allowed, but we shouldn't encourage it, and treating sum()
>> as if it were the One Obvious Way to concatenate data is, in my strong
>> opinion, the wrong thing to do.
>
> No, no. A fast sum() is not the one obvious way to concatenate data,

The Original Poster seems to think it is.


> much as sum() is not the one obvious way to add data. It just means
> that if conceptually you're looking for the sum, you'd be able to do
> it without shooting yourself in the foot.
>
> I'd also point out there are a *lot* of times when iadd is faster than
> add, not just contancation.

Examples please.


>> In my opinion, the One Obvious Way to concatenate a lot of arbitrary data is
>> list.extend, not sum.
>
> Is it? Maybe for lists. Often itertools.chain is better. It really
> depends on your data. I tend to use itertools.chain a lot more than
> list.extend, because I sum iterators more than I sum lists. Maybe I'm
> just weird.

I didn't say *the Best Way*, I said *the One Obvious Way*. The obvious way to have a lot of data of some arbitrary type is a list; the obvious way to concatenate a whole lot of lists into one list is using the extend method. If you have some special type or some special need, you may be able to do better.


> I can't remember the last time I had to sum tuples.

And I can't remember the last time I had to sum lists. Optimizing this case will solve exactly none of my problems.


> Also, if you're summing linked lists, why would it be O(n**2)
> performance? Surely it's still O(n), given the new implementation?

Actually, more like O(n*m). You have n linked lists each with an average of m nodes, you have to copy each node, so that's O(n*m). Sorry, simplifying n*m to n**2 is sloppy.



-- 
Steven


More information about the Python-ideas mailing list