sum for sequences?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Mar 29 23:01:03 EDT 2010


On Mon, 29 Mar 2010 19:05:30 -0700, Steve Howell wrote:

>> > If nothing else, I think it's reasonably for users to expect
>> > symmetry.
>>
>> Why? What is symmetry in programming?
> 
> "Symmetry" is best shown by example.
> 
>  >>> 3 - 2
>  1
>  >>> set([1,2,3]) - set([2,3])
>  set([1])


That's a useless example.


>>> 42 - 10
32
>>> set([42]) - set([10])
set([42])


You don't define symmetry. You don't even give a sensible example of 
symmetry. Consequently I reject your argument that because sum is the 
obvious way to sum a lot of integers, "symmetry" implies that it should 
be the obvious way to concatenate a lot of lists.


>>> 1+2
3
>>> [1] + [2]
[1, 2]
>>> set([1]) + set([2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'set' and 'set'


>> > If you can use "+" to concatentate lists, then it seems reasonable
>> > that something spelled "sum" would concatenate lists as well, and in
>> > reasonable time.
>>
>> Where do you get the "reasonable time" from?
>>
>>
> From common sense.  Concatenation of lists should be in O(M*N) time, not
> O(M*N*N), because there is no need to build intermediate lists.

You are correct that building intermediate lists isn't *compulsory*, 
there are alternatives, but the alternatives themselves have costs. 
Complexity itself is a cost. sum currently has nice simple semantics, 
which means you can reason about it: sum(sequence, start) is the same as 

total = start
for item in sequence:
    total = total + start
return total

You don't have to care what the items in sequence are, you don't have to 
make assumptions about what methods sequence and start have (beyond 
supporting iteration and addition). Adding special cases to sum means it 
becomes more complex and harder to reason about. If you pass some other 
sequence type in the middle of a bunch of lists, what will happen? Will 
sum suddenly break, or perhaps continue to work but inefficiently?

You still need to ask these questions with existing sum, but it is 
comparatively easy to answer them: you only need to consider how the 
alternative behaves when added to a list. You don't have to think about 
the technicalities of the sum algorithm itself -- sometimes it calls +, 
sometimes extend, sometimes +=, sometimes something else... which of the 
various different optimized branches will I fall into this time? Who 
knows? sum already has two branches. In my opinion, three branches is one 
too many.


[...]
>> > It only takes a handful of sublists, about ten on my box, to expose
>> > the limitation of the Shlemeil-the-Painter O(M*N*N) algorithm that's
>> > under the hood.  It only takes 200 sublists to start getting a 10x
>> > degradation in performance.
>>
>> And how many times have you needed to add 200 sublists?
>>
>> How many times have you needed to add 10 sublists?
>>
>>
> Aggregating lists is a very common task in programming.  

"Aggregating" lists? Not summing them? I think you've just undercut your 
argument that sum is the "obvious" way of concatenating lists.

In natural language, we don't talk about "summing" lists, we talk about 
joining, concatenating or aggregating them. You have just done it 
yourself, and made my point for me. And this very thread started because 
somebody wanted to know what the equivalent to sum for sequences.

If sum was the obvious way to concatenate sequences, this thread wouldn't 
even exist.


> An example of a
> list of lists would be a list of departments, where each department is a
> list of employees.  If you want to sort the employees, you will want to
> aggregate to a list.

I grant you that 10 departments is likely to be a borderline case, but if 
you have 200 departments, then you will most likely be querying a 
database and getting back a single list of employees. And if you're not, 
you probably should be.


>> More special cases for implementers means more bugs in the language,
>> which means I end up writing my own code and ignoring the built-in
>> version anyway.
>>
>>
> Special cases do not have to introduce bugs in the language.

They don't *have* to, but since even Donald Knuth has written code with 
bugs in it, it is a safe bet that the number of bugs in code written by 
mere mortals will be roughly proportional to the complexity: more complex 
means more bugs. You can go to the bank with that.


>> More special cases means I have to pay the cost of high accuracy float
>> summation even when I don't need, or want, it.
>>
>>
> Nothing about making sum() work for the general cases precludes more
> specific, optimized functions.

You have missed my point. If I call your version of sum which all the 
special cases, I pay the overhead of the complexity regardless of whether 
I need it or not. The existence of other functions which duplicate the 
special cases in sum is irrelevant.



[...]
>> It's MORE costly than teaching users to use list.extend or
>> itertools.chain.
>>
>>
> Which the docs for sum() don't do.

Patches are always welcome.




-- 
Steven



More information about the Python-list mailing list