restriction on sum: intentional bug?

Ethan Furman ethan at stoneleaf.us
Sun Oct 18 22:52:41 EDT 2009


Carl Banks wrote:
> On Oct 18, 4:07 pm, Ethan Furman <et... at stoneleaf.us> wrote:
> 
>>Dave Angel wrote:
>>
>>>Earlier, I would have agreed with you.  I assumed that this could be
>>>done invisibly, with the only difference being performance.  But you
>>>can't know whether join will do the trick without error till you know
>>>that all the items are strings or Unicode strings.  And you can't check
>>>that without going through the entire iterator.  At that point it's too
>>>late to change your mind, as you can't back up an iterator.  So the user
>>>who supplies a list with mixed strings and other stuff will get an
>>>unexpected error, one that join generates.
>>
>>>To put it simply, I'd say that sum() should not dispatch to join()
>>>unless it could be sure that no errors might result.
>>
>>How is this different than passing a list to sum with other incompatible
>>types?
>>
>>Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
>>(Intel)] on win32
>>Type "help", "copyright", "credits" or "license" for more information.
>> >>> class Dummy(object):
>>...     pass
>>...
>> >>> test1 = [1, 2, 3.4, Dummy()]
>> >>> sum(test1)
>>Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>> >>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>> >>> ''.join(test2)
>>Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>TypeError: sequence item 4: expected string, Dummy found
>>
>>Looks like a TypeError either way, only the verbage changes.
> 
> 
> 
> This test doesn't mean very much since you didn't pass the the same
> list to both calls.  The claim is that "".join() might do something
> different than a non-special-cased sum() would have when called on the
> same list, and indeed that is true.
> 
> Consider this thought experiment:
> 
> 
> class Something(object):
>     def __radd__(self,other):
>         return other + "q"
> 
> x = ["a","b","c",Something()]
> 
> 
> If x were passed to "".join(), it would throw an exception; but if
> passed to a sum() without any special casing, it would successfully
> return "abcq".
> 
> Thus there is divergence in the two behaviors, thus transparently
> calling "".join() to perform the summation is a Bad Thing Indeed, a
> much worse special-case behavior than throwing an exception.
> 
> 
> Carl Banks

Unfortunately, I don't know enough about how join works to know that, 
but I'll take your word for it.  Perhaps the better solution then is to 
not worry about optimization, and just call __add__ on the objects. 
Then it either works, or throws the appropriate error.

This is obviously slow on strings, but mention of that is already in the 
docs, and profiling will also turn up such bottlenecks.  Get the code 
working first, then optimize, yes?  We've all seen questions on this 
list with folk using the accumulator method for joining strings, and 
then wondering why it's so slow -- the answer given is the same as we 
would give for sum()ing a list of strings -- use join instead.  Then we 
have Python following the same advice we give out -- don't break 
duck-typing, any ensuing errors are the responsibility of the caller.

~Ethan~



More information about the Python-list mailing list