[Python-Dev] Pre-PEP: Mutable keys in dictionaries

Guido van Rossum guido@python.org
Thu, 06 Feb 2003 13:57:33 -0500


> Here's a quick pre-PEP.
> 
> 
> Allow mutable objects as dictionary keys in certain cases by adding a
> new protocol. dict.__setitem__() and __contains__() would work like this
> bit of pseudo code:
> 
>   def __setitem__(self, key, value):
>       if not hashable(key):
>           if hasattr(key, "__immutable_copy__"):
>               key = key.__immutable_copy__()
>           else:
>               raise TypeError, "unhashable key"
>       ...insert value with key into dict...
> 
>   def __contains__(self, key):
>       if not hashable(key):
>           if hasattr(key, "__temporary_immutable__"):
>               key = key.__temporary_immutable__()
>           else:
>               raise TypeError, "unhashable key"
>       ...test for key membership...
> 
>   (other methods would be similar)
> 
> When implemented in C in dictobject.c, this would add no overhead for
> the "normal" case (when keys _are_ directly hashable).

I have a small doubt about this claim.  Inserting code into
PyDict_SetItem(), even if it is normally never executed, may reduce
code locality and hence cause the code to miss the cache more often
than before.  This *may* be addressed by rearranging the code, but
there's a limit to that.

I have a countersuggestion.

Could you do this as a subclass of dict?  A subclass in Python would
be inefficient, but might still be good enough for your second use
case (the ObjC bridge).  If not, a subclass in Python might be
feasible -- it would be a little bit more work than integrating it
into dictobject.c, but you have a lot less convincing to do, and you
can even get it to work with Python 2.2.

> The __temporary_immutable__ part of the protocol is needed to avoid
> unneccesary copying in those cases where the key isn't actually inserted
> into the dict, but is only used for membership testing. It would return
> a wrapper around the mutable object, defining __eq__ and __hash__. (This
> idea is stolen from the sets.py module.) If an implementer doesn't care
> about this efficiency, __temporary_immutable__ can be an alias to
> __immutable_copy__.
> 
> 
> Use Cases:
> 
> - sets.py could be written more efficiently and/or could be more easily
>   be rewritten in C.
> 
> - Again, the troubles of bridging Python to Objective-C have triggered
>   this idea. Due to the nature of Objective-C in general, and Cocoa in
>   particular, we can never be sure beforehand whether a string is
>   mutable or not. We can detect mutability at runtime, but depending on
>   this is dangerous as it exposes an implementation detail that may
>   change between releases of _any_ API that returns strings. Therefore
>   code that works in one setup may break in another, or even in the the
>   same setup after an arbitrary library upgrade, simply because an API
>   suddenly returns a mutable string where it didn't before. Yes this
>   sucks, but it's the way it is. <Add link to post from bbum explaining
>   this situation in detail>
> 
> 
> Backwards compatibilty:
> 
> 100%.
> 
> Overhead added
> 
> None
> 
> 
> Implementation
> 
> There's quite a bit of code duplication in those areas of dictobject.c
> that matter for this proposal, so implementing it will take longer than
> the 10 minutes I originally estimated <wink>. Perhaps this is a good
> opportunity to refactor these bits into inline functions? Not being a C
> guru I can't judge whether this is possible or desirable with the set of
> compilers that Python must be built with.

You can't rely on inline.  OTOH, GCC -O3 often inlines static
functions without needing a hint.

--Guido van Rossum (home page: http://www.python.org/~guido/)