Speed quirk: redundant line gives six-fold speedup

Fri Aug 26 16:29:17 EDT 2005

Raymond Hettinger wrote:

> 
> Mark Dickinson wrote:
>> I have a simple 192-line Python script that begins with the line:
>>
>> dummy0 = 47
>>
>> The script runs in less than 2.5 seconds.  The variable dummy0 is never
>> referenced again, directly or indirectly, by the rest of the script.
>>
>> Here's the surprise: if I remove or comment out this first line, the
>> script takes more than 15 seconds to run.  So it appears that adding a
>> redundant line produces a spectacular six-fold increase in speed!
> 
> Thanks for your post.  It is cute, brilliant, and interesting.
> 
> I haven't had time to investigate but can point at the likely cause.
> All of the global names are stored in a single hash table.  Search time
> is dictated by two factors, the sparseness of the hash table and the
> distribution of hash values.  With respect to sparseness, whenever that
> table becomes 2/3 full, it grows by a factor of four and becomes only
> 1/6 full (leading to many fewer collisions).  With respect to
> distribution, it should be noted that string hash values are decidely
> non-random and your variable names likely congested consecutive spaces
> in a nearly full table (resulting in seven times as many search probes
> to find a global value).
> 
> When the extra value was added, it likely resized the table four-fold
> and redistributed the hash values into fewer consecutive positions.

If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/