Interning own classes like strings for speed and size?

Stefan Behnel stefan_ml at behnel.de
Tue Dec 28 09:39:32 EST 2010


Steven D'Aprano, 28.12.2010 15:11:
> On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote:
>
>> Steven D'Aprano wrote:
>>>>>> class InternedTuple(tuple):
>>> ...     _cache = {}
>>> ...     def __new__(cls, *args):
>>> ...             t = super().__new__(cls, *args)
>>> ...             return cls._cache.setdefault(t, t)
>>
>> That looks good. The only thing that first bothered me is that it
>> creates an object and then possibly discards it again. However, there is
>> no way around that, since at least the key to the dict must be created
>> for lookup. Since key and value are the same here, this is even for
>> free.
>>
>> What I also found was that with the above, I can't provide __eq__ and
>> __ne__ that just check for identity. If I do, the lookup in setdefault()
>> will never find an existing tuple and I will never save memory for a
>> single object.
>
> If all you want is to save memory, you don't need to change the __eq__
> method. But if you still want to, try this:
>
> # Untested

Yep, that' the problem. ;)


> class InternedTuple(tuple):
>      _cache = {}
>      def __new__(cls, *args):
>          t = super().__new__(cls, *args)
>          return cls._cache.setdefault(args, t)
>      def __eq__(self, other):
>          return self is other
>      def __ne__(self, other):
>          return self is not other

What Ulrich meant, was: doing this will actually kill the caching, because 
the first time the comparison is called is when looking up the tuple while 
adding it to the interning dict. Since the new tuple is, well, new, it will 
not be equal (read: identical) to any cached tuple, thus resulting in a new 
entry regardless of its content.

Stefan




More information about the Python-list mailing list