Speed of string += string

Alex Martelli aleax at aleax.it
Sun Apr 13 04:38:14 EDT 2003


Roy Smith wrote:

> Alex Martelli <aleax at aleax.it> wrote:
>> Yes -- it's THE one major performance trap in naively coded Python,
>> since, when "a lot" is a sufficiently large N, a lot of += on short
>> strings ends up being O(N squared) where O(N) approaches would be
>> available.
> 
> One of the very few things I like about C++ over Python is that the STL
> is very careful to document the complexity of the various operations.

Yes, that's pretty unique -- C, Fortran, and all other languages I used,
never came with such useful specs for libraries and primitives.

> AFAICT, no such documentation exists for Python.

Well, there's the chapter on (testing, debugging, and) optimization in the
Nutshell -- Tim Peters kindly double-checked it, so any errors remaning
are HIS fault;-).  But of course there's a key difference -- the Nutshell's
info is *descriptive*, and that's quite another thing from the info in
the C++ Standard, which is *prescriptive* of how standard library
implementations MUST behave.

> As another example, I can guess that Python dictionaries work like STL
> maps on the inside, but that would be just a guess.  Is inserting an
> element logN?  Is keys() linear?  Is del on an element logN?  Is
> has_key() logN?  Those would be my guesses.  But they're just guesses.

If hash(x) is taken to be constant-time (actually it depends on x, of
course), then inserting, deleting and has_key are all roughly O(1) too
(amortized -- of course SOME insertions will cause the dict to grow,
but the idea is that they're rare enough not to matter in an amortized
sense).  keys is indeed O(N).

The crucial difference is that Python dicts are hashtables, while the
std::map in the standard library of C++ can't be, because it's based
on an order relationship, NOT on hashing.  So, std::map has an order,
but its O(...) performance just can't be as good as a dict's.


Alex





More information about the Python-list mailing list