interning strings

Mike Thompson none.by.e-mail
Sun Nov 7 18:09:02 EST 2004


[snip very useful explanation]

> 
> By the way, why would you want to mess with these implementation details?
> Use the == operator to compare strings and be happy ever after :-)
> 

'==' won't help me, I'm afraid.

I need to improve the speed and memory footprint of an application which 
reads in a very large XML document.

Some elements in the incoming documents can be filtered out, so I've 
written my own SAX handler to extract just what I want. All the same, 
the content being read in is substantial.

So, to further reduce memory footprint, my SAX handler tries to manually 
intern (using dicts of strings) a lot of the duplicated content and 
attributes coming from the XML documents. Also, I use the SAX feature 
'feature_string_interning' to hopefully intern the strings used for 
attribute names etc.

Which is all working fine, except that now, as a final process, I'd like 
to understand interning a bit more.

 From your explanation there seems to be no language rules, just 
implementation accidents.  And none of those will be particularly 
helpful in my case.

However, I still think I'm going to try using the builtin 'intern' 
rather than my own dict cache. That may provide an advantage, even if it 
doesn't work with unicode.

--
Mike



More information about the Python-list mailing list