Speed quirk: redundant line gives six-fold speedup

Thu Aug 25 15:55:32 EDT 2005

Stelios Xanthakis wrote:
> Mark Dickinson wrote:
> 
>> I have a simple 192-line Python script that begins with the line:
>>
>> dummy0 = 47
>>
>> The script runs in less than 2.5 seconds.  The variable dummy0 is never
>> referenced again, directly or indirectly, by the rest of the script.
>>
>> Here's the surprise: if I remove or comment out this first line, the
>> script takes more than 15 seconds to run.  So it appears that adding a
>> redundant line produces a spectacular six-fold increase in speed!
>>
>> (Actually, I had to add 29 dummy lines at the beginning of the code to
>> get the speed increase; if any one of these lines is removed the
>> running time reverts to around 15 seconds again.)
>>
>> Questions:
>>
>> (1) Can anyone else reproduce this behaviour, or is it just some quirk
>>     of my setup?
>> (2) Any possible explanations?  Is there some optimization that kicks
>>     in at a certain number of lines, or at a certain length of
>>     bytecode?
>> (3) If (2), is there some way to force the optimization, so that I can
>>     get the speed increase without having to add the extra lines?
>>
> 
> Hi.
> 
> I haven't been able to reproduce this but I had a similar case
> before (a program that some times crashed and some times worked
> perfectly and the difference was whitespace in the comments!!!).
> 
> After lots of wondering and thinking that I must be dreaming
> (luckily I had pyvm which also crashed but for different amounts
> of whitespace), it was solved.  The explanation is this: hash
> and comparison of objects depends on the state of the memory
> allocator.  A sample case is this:
> 
>     class A: pass
>     dummy0=47  # comment this to get a different result for min
>     a=A()
>     b=A()
>     print min (a, b)
> 
> the result of 'min' is not only non-deterministic but also depends
> on whether other things have been allocated before.  The same
> thing can happen for 'dictionary.keys()' if the keys are objects
> and 'iterate-over-set' when the set contains objects.

I do not get the point here: isn't min comparing the adress in memory as 
there is nothing else to compare?

[python 2.4.1 on ubuntu linux]

On 10 runs from within emacs I had about 50% for `a' and 50% for `b' 
returned by min (a,b). It was the same without the dummy0=47.

On 10 runs from the command line I had always `a' returned (with and 
without the dummy). The only difference was the address of b's object.
(For all the run of each case the two addresses were exactly the same)

For your first post:

On 10 tests run for each case (average time, but each time was never 
different one from the other more that .02s)

0.554 when the 28 dummies are in place
6.679 when commented out, removed or put in a single multiline string
2.576 when putting the 28 dummies in a list or a dict using a for loop
7.195 when creating the 28 dummies using an exec in a for loop

What about memory allocation that would be performed chunk by chunk by 
the interpreter? Then being at the end or at the beginning of a chunk 
may not be the same for processing? All the instructions of the program 
would then be in the cpu cache for example in the same block while in 
the other case theyr would be in two distinct block -> thus less caching 
for the cpu... [this particular problem arose when I was working several 
year ago on a persistent object manager for a distributed operating system]

-- 
rafi

	"Imagination is more important than knowledge."
	                            (Albert Einstein)