[Python-ideas] Why does `sum` use a default for the `start` parameter?

Vitor Bosshard algorias at gmail.com
Sat Dec 5 20:19:06 CET 2009


2009/12/5 Adam Olsen <rhamph at gmail.com>:
> On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias at gmail.com> wrote:
>> And in that case the special string handling could also be dropped?
>>
>>>>> sum(["a","b"], "start")
>> Traceback (most recent call last):
>>  File "<pyshell#0>", line 1, in <module>
>>    sum(["a","b"], "start")
>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>
>>
>> This behaviour is quite bothersome. Sum can handle arbitrary objects
>> in theory (as long as they define the correct special methods, etc.),
>> but it gratuitously raises an exception on strings. This behaviour is
>> also inconsistent with the following:
>>
>>>>> sum(["a","b"])
>> Traceback (most recent call last):
>>  File "<pyshell#1>", line 1, in <module>
>>    sum(["a","b"])
>> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>
>>
>> Where sum actually tries to add "a" to the default value of 0.
>
> sum is defined by repeatedly adding each number in a sequence.  As
> each number is usually constant, and the size of total grows
> logarithmically, this is O(n log n) (but due to implementation
> coarseness it usually isn't distinguished from O(n)).
>
> Concatenation however grows the total's size very quickly.  You
> instead get a performance of O(n**2).  Same result, wrong algorithm.
>
> It would be possible to special case strings, but why?  The programmer
> should know what algorithm they're using and what complexity class it
> has, so they can pick the right one (''.join(seq) in this case).  IOW,
> handling arbitrary objects is an illusion.


I think you misunderstood my point. Sorry if I wasn't clear enough in
my original message. I understand the performance characteristics of
repeated concatenation vs str.join. I just wonder why the language
goes out of its way to catch this particular occurrence of bad code,
given there are plenty of ways to misuse sum or any other builtin for
that matter. A newbie is more likely to get n**2 performance by using
a for loop than sum:

final = ""
for s in strings:
    final += s

Should python refuse to compile the above snippet? The answer is an
emphatic "no".



More information about the Python-ideas mailing list