interning strings

Peter Otten __peter__ at web.de
Mon Nov 8 03:25:43 EST 2004


Mike Thompson <none.by.e-mail> wrote:

> '==' won't help me, I'm afraid.
> 
> I need to improve the speed and memory footprint of an application which
> reads in a very large XML document.

Yes, I should have read your post carefully. But I was preoccupied with
speed...

> From your explanation there seems to be no language rules, just
> implementation accidents.  And none of those will be particularly
> helpful in my case.

With arbitrary strings the likelihood of a cache hit decreases fast. Using
your own dictionary and checking the refcounts could give you interesting
insights. Unfortunately there is no WeakDictionary with both keys and
values as weakrefs, so you have to do some work, or you will actually
_increase_ memory footprint.
 
> However, I still think I'm going to try using the builtin 'intern'
> rather than my own dict cache. That may provide an advantage, even if it
> doesn't work with unicode.
 
You might at least choose an alias

my_intern = intern 

then, lest you later regret that limitation.

Peter




More information about the Python-list mailing list