sum and strings

Tim Chase python.list at tim.thechases.com
Thu Aug 24 12:12:14 EDT 2006


>> Just because something is slow or sub-optimal doesn't mean it 
>> should be an error.
> 
> that's not an error because it would be "slow or sub-optimal" to add 
> custom objects, that's an error because you don't understand how "sum" 
> works.
> 
> (hint: sum != reduce)

No, clearly sum!=reduce...no dispute there...

so we go ahead and get the sum([q1,q2]) working by specifying a 
starting value sum([q1,q2], Q()):

 >>> class Q(object):
...     def __init__(self, n=0, i=0,j=0,k=0):
...         self.n = n
...         self.i = i
...         self.j = j
...         self.k = k
...     def __add__(self, other):
...         return Q(self.n+other.n,
...             self.i+other.i,
...             self.j+other.j,
...             self.k+other.k)
...     def __repr__(self):
...         return "<Q(%i,%i,%i,%i)>" % (
...             self.n,
...             self.i,
...             self.j,
...             self.k)
...
 >>> q1 = Q(1,2,3,5)
 >>> q2 = Q(7,11,13,17)
 >>> q1+q2
<Q(8,13,16,22)>
 >>> sum([q1,q2])
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'Q'
 >>> sum([q1,q2], Q())
<Q(8,13,16,22)>


Thus, sum seems to work just fine for objects containing an 
__add__ method.  However, strings contain an __add__ method.

 >>> hasattr("", "__add__")
True

yet, using the same pattern...

 >>> sum(["hello", "world"], "")
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]


Which seems like an arbitrary prejudice against strings...flying 
in the face of python's duck-typing.  If it has an __add__ 
method, duck-typing says you should be able to provide a starting 
place and a sequence of things to add to it, and get the sum.

However, a new sum2() function can be created...

 >>> def sum2(seq, start=0):
...     for item in seq:
...         start += item
...     return start
...

which does what one would expect the definition of sum() should 
be doing behind the scenes.

 >>> # generate the expected error, same as above
 >>> sum2([q1,q2])
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "<stdin>", line 3, in sum2
TypeError: unsupported operand type(s) for +=: 'int' and 'Q'
 >>> # employ the same solution of a proper starting point
 >>> sum2([q1,q2], Q())
<Q(8,13,16,22)>
 >>> # do the same thing for strings
 >>> sum2(["hello", "world"], "")
'helloworld'

and sum2() works just like sum(), only it happily takes strings 
without prejudice.

 From help(sum):
"Returns the sum of a sequence of numbers (NOT strings) plus the 
value of parameter 'start'.  When the sequence is empty, returns 
start."

It would be as strange as if enumerate() didn't take strings, and 
instead forced you to use some other method for enumerating strings:

 >>> for i,c in enumerate("hello"): print i,c
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: enumerate() can't enumerate strings [use 
"hello".enumerator() instead]

Why the arbitrary breaking of duck-typing for strings in sum()? 
Why make them second-class citizens?

The interpreter is clearly smart enough to recognize when the 
condition occurs such that it can throw the error...thus, why not 
add a few more smarts and have it simply translate it into 
"start+''.join(sequence)" to maintain predictable behavior 
according to duck-typing?

-tkc







More information about the Python-list mailing list