sum works in sequences (Python 3)

Wed Sep 19 12:14:26 EDT 2012

On Wed, 19 Sep 2012 09:03:03 -0600, Ian Kelly wrote:

> I think this restriction is mainly for efficiency.  sum(['a', 'b', 'c',
> 'd', 'e']) would be the equivalent of 'a' + 'b' + 'c' + 'd' + 'e', which
> is an inefficient way to add together strings.

It might not be obvious to some people why repeated addition is so 
inefficient, and in fact if people try it with modern Python (version 2.3 
or better), they may not notice any inefficiency.

But the example given, 'a' + 'b' + 'c' + 'd' + 'e', potentially ends up 
creating four strings, only to immediately throw away three of them:

* first it concats 'a' to 'b', giving the new string 'ab'
* then 'ab' + 'c', creating a new string 'abc'
* then 'abc' + 'd', creating a new string 'abcd'
* then 'abcd' + 'e', creating a new string 'abcde'

Each new string requires a block of memory to be allocated, potentially 
requiring other blocks of memory to be moved out of the way (at least for 
large blocks).

With only five characters in total, you won't really notice any slowdown, 
but with large enough numbers of strings, Python could potentially spend 
a lot of time building, and throwing away, intermediate strings. Pure 
wasted effort.

For another look at this, see:
http://www.joelonsoftware.com/articles/fog0000000319.html

I say "could" because starting in about Python 2.3, there is a nifty 
optimization in Python (CPython only, not Jython or IronPython) that can 
*sometimes* recognise repeated string concatenation and make it less 
inefficient. It depends on the details of the specific strings used, and 
the operating system's memory management. When it works, it can make 
string concatenation almost as efficient as ''.join(). When it doesn't 
work, repeated concatenation is PAINFULLY slow, hundreds or thousands of 
times slower than join.

-- 
Steven