Confounded by Python objects

Sun Jul 27 06:15:31 EDT 2008

On Sat, 26 Jul 2008 18:54:22 +0000, Robert Latest wrote:

> Here's an interesting side note: After fixing my "Channel" thingy the
> whole project behaved as expected. But there was an interesting hitch.
> The main part revolves around another class, "Sequence", which has a
> list of Channels as attribute. I was curious about the performance of my
> script, because eventually this construct is supposed to handle
> megabytes of data. So I wrote a simple loop that creates a new Sequence,
> fills all the Channels with data, and repeats.
> 
> Interistingly, the first couple of dozens iterations went satisfactorily
> quickly (took about 1 second total), but after a hundred or so times it
> got really slow -- like a couple of seconds per iteration.
> 
> Playing around with the code, not really knowing what to do, I found
> that in the "Sequence" class I had again erroneously declared a
> class-level attribute -- rather harmlessly, just a string, that got
> assigned to once in each iteration on object creation.
> 
> After I had deleted that, the loop went blindingly fast without slowing
> down.
> 
> What's the mechanics behind this behavior?

Without actually seeing the code, it's difficult to be sure, but my guess 
is that you were accidentally doing repeated string concatenation. This 
can be very slow.

In general, anything that looks like this:

s = ''
for i in range(10000):  # or any big number
    s = s + 'another string'

can be slow. Very slow. The preferred way is to build a list of 
substrings, then put them together in one go.

L = []
for i in range(10000):
    L.append('another string')
s = ''.join(L)

It's harder to stumble across the slow behaviour these days, as Python 
2.4 introduced an optimization that, under some circumstances, makes 
string concatenation almost as fast as using join(). But be warned: join()
is still the recommended approach. Don't count on this optimization to 
save you from slow code.

If you want to see just how slow repeated concatenation is compared to 
joining, try this:

>>> import timeit
>>> t1 = timeit.Timer('for i in xrange(1000): x=x+str(i)+"a"', 'x=""')
>>> t2 = timeit.Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
>>> 
>>> t1.repeat(number=30)
[0.8506159782409668, 0.80239105224609375, 0.73254203796386719]
>>> t2.repeat(number=30)
[0.052678108215332031, 0.052067995071411133, 0.052803993225097656]

Concatenation is more than ten times slower in the example above, but it 
gets worse:

>>> t1.repeat(number=40)
[1.5138671398162842, 1.5060651302337646, 1.5035550594329834]
>>> t2.repeat(number=40)
[0.072292804718017578, 0.070636987686157227, 0.070624113082885742]

And even worse:

>>> t1.repeat(number=50)
[2.7190279960632324, 2.6910948753356934, 2.7089321613311768]
>>> t2.repeat(number=50)
[0.087616920471191406, 0.088094949722290039, 0.087819099426269531]

-- 
Steven