[Python-Dev] Rethinking intern() and its data structure

Collin Winter collinw at gmail.com
Fri Apr 10 06:07:54 CEST 2009


On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel
<john.arbash.meinel at gmail.com> wrote:
> Greg Ewing wrote:
>> John Arbash Meinel wrote:
>>> And the way intern is currently
>>> written, there is a third cost when the item doesn't exist yet, which is
>>> another lookup to insert the object.
>>
>> That's even rarer still, since it only happens the first
>> time you load a piece of code that uses a given variable
>> name anywhere in any module.
>>
>
> Somewhat true, though I know it happens 25k times during startup of
> bzr... And I would be a *lot* happier if startup time was 100ms instead
> of 400ms.

Quite so. We have a number of internal tools, and they find that
frequently just starting up Python takes several times the duration of
the actual work unit itself. I'd be very interested to review any
patches you come up with to improve start-up time; so far on this
thread, there's been a lot of theory and not much practice. I'd
approach this iteratively: first replace the dict with a set, then if
that bears fruit, consider a customized data structure; if that bears
fruit, etc.

Good luck, and be sure to let us know what you find,
Collin Winter


More information about the Python-Dev mailing list