Why are tuples immutable?

Fri Dec 17 21:54:20 EST 2004

On 17 Dec 2004 08:20:10 GMT, Antoon Pardon <apardon at forel.vub.ac.be> wrote:

>Op 2004-12-17, Jeff Shannon schreef <jeff at ccvcorp.com>:
>> Adam DePrince wrote:
>>
>>>And how exactly do you propose to mutate an object without changing its
>>>hash value?
>>>
>>>
>>>* Create this mutate-able object type.  
>>>* Create two objects that are different with different hash values.
>>>* Mutate one so that it is the same as the other.
>>>* Compare their hash values.  
>>>
>>>If the hash values are the same, you lose because you didn't keep your
>>>original promise.
>>>
>>>If the hash values are different then you just broke your object because
>>>two objects with the same value must have the same hash value, but yours
>>>are different.
>>>  
>>>
>>
>> Correct me if I'm wrong, here, but I believe that what you're saying 
>> here (and my vague memories of the little bit that I've read about 
>> hashes) is that, for a well-behaved hash function, then A == B should 
>> imply that hash(A) == hash(B).  (The reverse is *not* true, however -- 
>> hash(A) == hash(B) does not necessarily say anything about whether A == B.)
>>
>> If that is a correct condition for a well-behaved hash function, then it 
>> is indeed impossible to have a well-behaved hash function that can be 
>> useful in the face of object mutation.  For something like a list, one 
>> can only define a poorly-behaved hash-like function.
>
>You are looking at it in a too limited way. You are not forced to
>have an == operator that behaves in standard ways. For instance
>you can have objects that are only equal if they are the same
>and thus can be unequal even if all components are equal.
>
>Even if 
>
>> To take another approach -- given some function that allows lists to 
>> (pretend to be) hashable:
>>
>> .>>> key = [1,2]
>> .>>> d[key] = 'foo'
>> .>>> d[[1,2]]
>> ????
>> .>>> key.append(3)
>> .>>> d[key]
>> ???
>> .>>>
>>
>> As I understand it, it is impossible for there to be a hash function for 
>> which both of these cases can return the same object.
>
>How about:
>
>  hash = id
Literally that would just create a local binding for hash, so I assume you
mean that to indicate substituting the id function wherever hash was done.

You might want to play with lambda x:hash(repr(x)) or
maybe better lambda x:hash(pickle.dumps(x)) instead of id(x)
>
>
>You also make the fault that because people ask for the possibility of
>keys being mutable objects, that they want to mutate those object while 
>being a key.
>
>> For most uses of 
>> dicts, it is necessary that the first case be valid -- a dict isn't 
>> going to be very useful to me if I have to pass around all of the 
>> objects used as keys in order to access it. Equality-based hashes are 
>> necessary, so the second case must then be disallowed.
>
>But you should be carefull not to limit people to what you think is
>usefull.
>
>> But isn't it a bit nonsensical that, without ever rebinding key, one can 
>> do something like
>>
>> .>>> d[key] = 'foo'
>> .>>> frombulate(key)
>> .>>> d[key]
>> Traceback (most recent call last):
>>   File "<interactive input>", line 1, in ?
>> KeyError: key
>> .>>>
>
>It is just as nonsensical that, without ever rebinding an element, one
>can unsort a list, or violate the heap invariant of a list.
>
>> In order to maintain the logical consistency that if an object is used 
>> as a dict key, that same object should reasonably be expected to 
>> retrieve the same value, identity-based hashes are necessary.  As a 
>> result, the first option must be disallowed.
>
>Then why don't we disallow lists of mutable objects to be sorted or
>to be made into a heap. In fact each time we have a structure that
>relies on some invariant among its members we should disallow mutable
>members because with mutable members the invariant can be violated
>without ever rebinding one of the elements.
>
>> In either case, if it's ever possible for equality to change while 
>> identity doesn't, then somewhere along the lines one of the core 
>> requirements, the contract if you will, of a dictionary *must* be 
>> broken.  And while it may be possible to get away with breaking that 
>> contract some of the time (by postulating, for example, that if one 
>> mutates a dict key then things will break), this will result in more 
>> bugs and more confusion over time.  There is no way for Python to be 
>> able to behave consistently in the face of mutable dict keys, therefore 
>> ("In the face of ambiguity, refuse the temptation to guess.") Python 
>> declines the temptation of making it trivial to use mutable objects as 
>> dict keys.
>
>As it turns out, python makes no difference in difficulty for making
>either mutable or immutable objects usable as dictionary keys. The
>only difference is that python only made its standard immutable
>types hashable and not its standard mutable objects.
>
In one sense a mutable could looked upon as immutable until it is mutated,
so why not allow its use in the same way as immutables?
Is that basically the point of view you are trying to explain?

Ok, suppose you did allow that. What kind of error symptoms would you like
to have after a dict key is mutated and an attempt is made to look up the value
using a mutable equal to the original key value? Presumably a KeyError exception?
Or did you want to use a "frozen" copy of the original key, and not get a KeyError?
Or should a new value equal to the mutated current value of the original key succeed?

Assuming the latter, what does this imply implementation-wise? For one thing, it would mean
that every time you used a reference to a mutable as a key, the hash calculation
would have to be repeated, unless you provided a way to avoid that by caching the
hash value and noticing when a mutation invalidates the hash. Since e.g. a tree of nested
lists may have elements and sub-trees that are shared by reference from arbitrarily many
other composite objects (lists, tuples, function default args, class variables,
etc etc., ISTM it becomes difficult to make a worthwhile hash-invalidation mechanism,
that ripples along to every affected node of the graph that might be being used as a key
in some dict. So you would wind up having to recalculate the hash of your mutable composite
every time it's used as a key.

So how to avoid that? Well, if you provide a standard immutable composite object,
you can say the hash value by definition can't change after it's computed, so you
could e.g., provide a hash-value slot in the object's representation, where the
hash value would be cached for re-use (I don't know if python tuples do that, but
it would seem like a useful optimization). After hashing, key objects must still be compared
for equality to the key objects referred to in the dict's hash table, but at least
hashing does not have to be repeated. And if hash values don't change, references from
dict hash tables to objects with constant hash values don't need to be changed.

So why not just notice references to mutables when used as keys and automatically rehash them (or
use a non-hashed dict implementation for them)? That's just a performance hit, and the user could
choose to accept that. Ok, but what of the next step, i.e., the comparison for equality?
How do you find the value to compare to? That's what the hash is normally for -- quickly to locate
a reference to the hashed object in the hash table. But if the reference is to a mutable whose hash has
changed, we are not going to find it when we hash a new key and go looking for a matching hash.
There will be no new hash for a mutated object's new value unless we keep the hash table up to date somehow.
And what happens if one mutable key is mutated so as to equal an existing mutable key?
Should this have the same semantics as d[mutatedvalue] = d[pre_mutation_value]?

If you take the frozen-key approach, we need to compare a new key value with the deep copy of the original,
which the old hash correctly refers to, or we need to update the hash table when an entry refers to a mutated
object whose hash no longer matches (i.e., you could substitute repeated computation of all mutables' hashes
for deep copying of original mutable keys, or figure a way to propagate mutation notifications again -- or
partition your dict into a normal hashed part and a brute force association list for the mutables and their
corresponding values).

I suspect that all that work to supply functionality involving choices that would cause endless discussion
about what the semantics should be didn't seem too worthwhile in comparison with other uses of guru developer
time, and so you are given the ability to customize __hash__ and __cmp__ and to subclass dict, and you are
politely encouraged to have a go at implementing what you are imagining for yourself ;-)

Hm, just for the heck of it, does this do what you want? (all features not tested, none tested beyond what you see,
and special methods like update and iterators will just work on the plain dict part if they work at all, unless
you add the code ;-)

(note what happens when a mutable key mutates to equal another existing key below):

 >>> class MKD(dict):
 ...     """mutable-key dict"""
 ...     def __init__(self, *arg, **kw):
 ...         self.mutkeys = []
 ...         self.mutvals = []
 ...         if not arg and not kw : return
 ...         if len(arg)==1: arg = arg[0]
 ...         if isinstance(arg, type(self)):
 ...             self.mutkeys = arg.mutkeys[:]
 ...             self.mutvals = arg.mutvals[:]
 ...             dict.__init__(self, arg)
 ...             return
 ...         elif isinstance(arg, dict):
 ...             arg = arg.items()
 ...         for k,v in arg:
 ...             self[k] = v
 ...         for k,v in kw.items():
 ...             self[k] = v
 ...     def __setitem__(self, key, value):
 ...         try: dict.__setitem__(self, key, value)
 ...         except TypeError:
 ...             try:
 ...                 i = self.mutkeys.index(key)
 ...                 self.mutvals[i] = value
 ...             except ValueError:
 ...                 self.mutkeys.append(key)
 ...                 self.mutvals.append(value)
 ...     def __getitem__(self, key):
 ...         try: return dict.__getitem__(self, key)
 ...         except TypeError:
 ...             try:
 ...                 i = self.mutkeys.index(key)
 ...                 return self.mutvals[i]
 ...             except ValueError:
 ...                 raise KeyError, '%r' %key
 ...     def __delitem__(self, key):
 ...         try: dict.__delitem__(self, key)
 ...         except (KeyError, TypeError):
 ...             try:
 ...                 i = self.mutkeys.index(key)
 ...                 del self.mutkeys[i]
 ...                 del self.mutvals[i]
 ...             except ValueError:
 ...                 raise KeyError, '%r' %key
 ...                 self.mutvals.append(value)
 ...     def __repr__(self):
 ...         return '<MKD %s + {%s}>' % (dict.__repr__(self),
 ...             ', '.join(['%r:%r'%t for t in zip(self.mutkeys, self.mutvals)]))
 ...
 >>> md = MKD(a=1,b=2)
 >>> md
 <MKD {'a': 1, 'b': 2} + {}>
 >>> k12 = [1,2]
 >>> k13 = [1,3]
 >>> md[k12]=12
 >>> md
 <MKD {'a': 1, 'b': 2} + {[1, 2]:12}>
 >>> md[k13]=13
 >>> md
 <MKD {'a': 1, 'b': 2} + {[1, 2]:12, [1, 3]:13}>
 >>> k12
 [1, 2]
 >>> k12[1]+=1
 >>> k12
 [1, 3]
 >>> md[k12]
 12
 >>> md
 <MKD {'a': 1, 'b': 2} + {[1, 3]:12, [1, 3]:13}>
                          ^k12^^     ^k13^^
 >>> del md[k12]
 >>> md
 <MKD {'a': 1, 'b': 2} + {[1, 3]:13}>
                          ^k13^^
 >>> md[k12]
 13
 >>> del md['a']
 >>> md
 <MKD {'b': 2} + {[1, 3]:13}>
 >>> k12
 [1, 3]
 >>> k12[1]-=1
 >>> k12
 [1, 2]
 >>> md[k12]
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "<stdin>", line 35, in __getitem__
 KeyError: '[1, 2]'

But k13 is bound to the dicts mutable key, so
 >>> k13
 [1, 3]
 >>> k13[1]-=1

it shows up in the repr of the dict:
 >>> md
 <MKD {'b': 2} + {[1, 2]:13}>

and k12's value now matches, so it can retrieve the value
 >>> md[k12]
 13
 >>>

Any new thoughts? ;-)

Regards,
Bengt Richter