Speed quirk: redundant line gives six-fold speedup

Thu Aug 25 15:40:48 EDT 2005

Mark Dickinson wrote:
> I have a simple 192-line Python script that begins with the line:
>
> dummy0 = 47
>
> The script runs in less than 2.5 seconds.  The variable dummy0 is never
> referenced again, directly or indirectly, by the rest of the script.
>
> Here's the surprise: if I remove or comment out this first line, the
> script takes more than 15 seconds to run.  So it appears that adding a
> redundant line produces a spectacular six-fold increase in speed!

Thanks for your post.  It is cute, brilliant, and interesting.

I haven't had time to investigate but can point at the likely cause.
All of the global names are stored in a single hash table.  Search time
is dictated by two factors, the sparseness of the hash table and the
distribution of hash values.  With respect to sparseness, whenever that
table becomes 2/3 full, it grows by a factor of four and becomes only
1/6 full (leading to many fewer collisions).  With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.

Raymond

P.S.  To analyze it further, start with something like this:

>>> len(set(hash('dummy%d' %i) & 31 for i in xrange(29)))
26
>>> len(set(hash('dummy%d' %i) & 127 for i in xrange(29)))
29