Optimisation Hints (dict processing and strings)

Peter Hansen peter at engcorp.com
Tue Mar 29 10:01:12 EST 2005


OPQ wrote:
> I'd happy to have you share some thougts about ultimate optimisations
> on those 2 topics:
> 
> (1)- adding one caractere at the end of a string (may be long)
>>>>longone=longone + char # where len(char)== 1
> 
> I known that string concatenation is time consuming, but a small test
> on timeit seems to show that packing and creating an array for those 2
> elements is equally time consuming

You've misunderstood the comments about this area.
String concatenation is *not* "time consuming".
*Repeated* concatenations *will become very time
consuming*, along an exponential curve.  That's
what the discussions about O(n^2) are referring
to.  A single string concatenation is definitely
going to be faster than any amount of converting
to other data structures and such.

Basically, to concatenate two strings, Python
simply adds their sizes together and allocates
a new memory area of sufficient size, then copies
the bytes from each string to the new area.  Not
much overhead there, for a single concatenation.

The problem comes when you try to scale this.
At some point, the overhead of allocating and
deallocating all the intermediate strings becomes
much larger than other approaches would take.
For example, by building a list with references
to all the strings, then using join(), you save
on the intermediate steps.  Join performs the
length-summation and memory allocation steps
only once, and copies each individual string into
the final area only once.  Much faster, provided
the overhead of growing a list as you append()
string references is small enough.  And it is,
since growing a list is not O(n^2) in Python
(though I don't recall exactly what it is).

The key with this stuff is to realize that neither
approach is inherently faster: it depends on how
much concatenation you are doing.  It could be
that concatenating up to four strings is faster
than using the append/join technique, or it could
be that it's faster up to twenty strings.

When you're not talking hundreds of operations,
and when you're not talking about actual *measured*
performance problems (in other words, when we would
call it "premature optimization"), focus on the
readability of the code and on keeping it simple
and clean.  Worry about the correctness of the code,
and leave optimization worries till later.  (But
understand this "big O" stuff well enough to
be able to avoid the real problem areas, when
possible.)

-Peter



More information about the Python-list mailing list