Speed quirk: redundant line gives six-fold speedup
Benjamin Niemann
pink at odahoda.de
Fri Aug 26 16:29:17 EDT 2005
Raymond Hettinger wrote:
>
> Mark Dickinson wrote:
>> I have a simple 192-line Python script that begins with the line:
>>
>> dummy0 = 47
>>
>> The script runs in less than 2.5 seconds. The variable dummy0 is never
>> referenced again, directly or indirectly, by the rest of the script.
>>
>> Here's the surprise: if I remove or comment out this first line, the
>> script takes more than 15 seconds to run. So it appears that adding a
>> redundant line produces a spectacular six-fold increase in speed!
>
> Thanks for your post. It is cute, brilliant, and interesting.
>
> I haven't had time to investigate but can point at the likely cause.
> All of the global names are stored in a single hash table. Search time
> is dictated by two factors, the sparseness of the hash table and the
> distribution of hash values. With respect to sparseness, whenever that
> table becomes 2/3 full, it grows by a factor of four and becomes only
> 1/6 full (leading to many fewer collisions). With respect to
> distribution, it should be noted that string hash values are decidely
> non-random and your variable names likely congested consecutive spaces
> in a nearly full table (resulting in seven times as many search probes
> to find a global value).
>
> When the extra value was added, it likely resized the table four-fold
> and redistributed the hash values into fewer consecutive positions.
If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
More information about the Python-list
mailing list