Speed of string += string

Mon Apr 14 06:53:34 EDT 2003

Marcus Alanen wrote:

> On Sat, 12 Apr 2003 20:14:51 -0400, Roy Smith <roy at panix.com> wrote:
>>One of the very few things I like about C++ over Python is that the STL
>>is very careful to document the complexity of the various operations.
>>AFAICT, no such documentation exists for Python.
> 
> Indeed, that was the first thing I wondered about when I started
> learning Python.

Interesting!  What other languages (besides C++ for a subset of its
standard libraries) give O() specifications for their operations?
In other words, what other languages did you learn, before Python,
that didn't leave you wondering about such performance issues?

>>As another example, I can guess that Python dictionaries work like STL
>>maps on the inside, but that would be just a guess.  Is inserting an
>>element logN?  Is keys() linear?  Is del on an element logN?  Is
>>has_key() logN?  Those would be my guesses.  But they're just guesses.
> 
> Others answered this very thoroughly, I'd just like to emphasize the
> linearity of keys() --- most often you can manage with .iterkeys() or
> .itervalues() just fine, and it's a lot faster.

Both keys and iterkeys are O(N) [as, of course, is itervalues].

As for the performance issues, I wouldn't be so sure.  E.g.:

[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
> d=dict.fromkeys(range(200))''' 'for x in d.keys(): pass'
10000 loops, best of 3: 24.3 usec per loop
[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
d=dict.fromkeys(range(200))''' 'for x in d.iterkeys(): pass'
10000 loops, best of 3: 28.8 usec per loop
[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
d=dict.fromkeys(range(200))''' 'for x in d: pass'
10000 loops, best of 3: 28.1 usec per loop

[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
d=dict.fromkeys(range(200))''' 'for x in d.keys(): pass'
10000 loops, best of 3: 24.7 usec per loop
[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
d=dict.fromkeys(range(200))''' 'for x in d: pass'
10000 loops, best of 3: 28.9 usec per loop
[alex at lancelot python2.3]$ python2.3 -O timeit.py -s'''
d=dict.fromkeys(range(200))''' 'for x in d.iterkeys(): pass'
10000 loops, best of 3: 28.2 usec per loop
[alex at lancelot python2.3]$

In this case, and of course specifically on my machine (1.2 GHz
Athlon running Mandrake 9.0 and Python 2.3a2), we see that the
performance advantage is rather for looping on d.keys(), which
ends up being around 15% faster than looping on d.iterkeys()
[or equivalently looping directly on d itself].  I observe
similar results for Python 2.2.2 on the same machine, btw
(but performance advantage for keys reduced to just 3-5% as both
loops are substantially slower -- 2.3 is quite a bit faster
than 2.2 just about across the whole spectrum of operations).

That's most definitely NOT a significant performance difference, 
of course, but calling iteration on iterkeys "a lot faster" than 
iteration on keys would appear to be substantially misleading, at 
least without a lot of qualifications and hedging!-)

Alex