Mutable objects which define hash (was Re: Why are tuples immutable?)

Wed Dec 22 07:57:01 EST 2004

OK, I think I need to recap a bit before starting my reply (for my own benefit, 
even if nobody else's). (The rest of this post will also include a fair bit of 
repeating things that have already been said elsewhere in the thread)

====
The actual rule dictionaries use when deciding whether or not to allow a key is 
simply 'can I hash it?'. Lists don't provide a hash, and tuples only provide a 
hash if every item they contain provides a hash.

For the dictionary to behave sanely, two keys which compare equal must also hash 
equal. Otherwise, (key1 == key2) == (d[key1] is d[key2]) may not hold.

So the one absolute rule on hashes is that (x == y) must imply that (hash(x) == 
hash(y)). In other words, the hash operation must not involve any object 
properties that are not also involved in comparison for equality.

The rule recommended for hashes is that a mutable object only define a hash if 
both the comparison operation and the hash are based on immutable portions of 
the object (such as it's ID). (The language reference currently recommends 
something stricter - it suggests that you don't define hash at all if you define 
a custom comparison method on a mutable object. That's overly strict, though)

Now, and here's the key point: Python's builtin objects are written in 
accordance with this recommendation, which means the mutable objects like dict 
and list don't define __hash__ at all.

Antoon Pardon wrote:
> that it would be handy to use such an object as a key and he has some
> assurance that those objects are stable while in use as a key, I see
> nothing wrong with using such mutable objects as keys.

Scarily enough, you're starting to persuade me that the idea isn't *completely* 
insane. Maybe merely mostly insane ;)

The interesting thing is that allowing mutable objects has keys doesn't require 
*any* changes to dict. It only requires changes to the mutable objects (e.g. 
having list adopt tuple's hash method).

Anyway, what are the consequences of a type which has a 'variable hash'?

The only real bad consequence is that the internal state of a dictionary can 
break if someone alters the hash of one of its keys. The restriction to 
'constant hashes' is aimed squarely at eliminating this class of programming 
errors. It doesn't actually provide any performance gain.

A performance loss would only occur if the dictionary tried to be helpful in 
detecting when a key mutates - and I see no reason why it should.

>>If you are saying you are happy with "undefined behavior" as the outcome of mutating keys, ok,
>>but if your implementation lets mutated-key errors pass silently, good luck debugging ;-)
> 
> But that problem is not limited to dictionaries. If you make a heapqueue
> and mutate an element in it, such errors will pass silently too and 
> will be just as hard to debug. Or suppose you create a class that
> keeps a sorted list of a number of objects and the programmer accidently
> mutates an object in it. That will probably pass silently too and
> give you night mares debugging.

The longer I consider it, the more this seems like a valid analogy. There is 
nothing preventing dictionaries from having a rehash() method.

Consider:
# Mutate some value in mylist
mylist.sort()

# Mutate some key in mydict
mydict.rehash()

Rehash would work exactly as Antoon suggested - scan all the hash buckets, and 
any key which hashes incorrectly gets moved to the right place. Forgetting to 
rehash() would have consequences no worse than forgetting to sort().

> The problem is also that you really can't make new immutable objects
> AFAIK. I have once made a vector class that didn't have any mutating
> method and I always used it as if it as immutable but because of
> the "we are all consenting adults here" attitude of python I can't
> assure that other people of my class won't directly change a vector.
> 
> Do you find I shouldn't use these vectors as keys?

This is a fair point, actually:

Py> from decimal import Decimal
Py> x = Decimal()
Py> hash(x)
0
Py> x._int = 1,
Py> hash(x)
1

> But mutable objects is fraught with pitfall anyhow. Even mutating an
> element in a list you are iterating over can give unexpected results
> or just assigning a mutable to other names a few times and then mutating
> it through one name and finding out an other name that was supposed to
> stay stable, mutated too. Why else the warning against the use of
> a default value of [] for an argument?

Another fair point, although I'm not sure that pointing out that mutable values 
already give you plenty of ways to shoot yourself in the foot is a point in 
*favour* of your idea :)

> A last idea, but this definetly is only usefull for debugging, is
> having a dictionary that keeps a copy of the key, with the ability
> to check the keys againt the copy, this might help in debugging.

I have a different suggestion: an identity dictionary.

It ignores __hash__, __cmp__ and __eq__, and instead uses id() and is.

Or even something like:

override_dict(id, lambda x, y: x is y)(<normal dict contructor arguments go here>)

> Personnaly I find the atmosphere in general here very defensive when
> some kind of feature is questioned. There seems a definite "Love it
> or leave it" attitude to be present.

Generally depends on the feature, but yeah, I've seen that happen (and probably 
been guilty of it, too)

> I have usenet experience enough to know that newsgroups can vary wildly
> and that you should evaluate each on its own merits. I do agree c.l.py
> is very helpfull in case you need help in how to solve a particular
> problem in the language. That niceness can disappear rather swiftly
> if the pros and cons of python itself get discussed.

Usually because the ideas are ill-formed and poorly thought out. Unfortunately, 
the attitude that spawns can bleed over into ideas which aren't without merit, 
but aren't always expressed clearly.

> I'm currently not under the impression I'm able to. Sure I could
> implement the above mentioned classes, but my feeling is that
> should I present them here, nobody would be waiting for them.
> Not because they would be unusfull, but because they go against
> the current paradigm.

As I see it, you have a couple of courses available:

If you're into pain, consider further developing the 'rehash' idea, and the 
option of adding hash methods to the mutable builtins. You've persuaded me that 
the idea is not entirely unreasonable. However, I mention pain, because you'd 
still have work to do persuading enough people (including me) that the 
additional hard to detect bugs this adds to the language would be worth it.

I don't really recommend that option.

A potentially more fruitful course is the dictionary variants identity_dict or 
override_dict. The current dict() implementation forces the use of hash() for 
sorting into hash buckets and "x == y" for key matching. An identity_dict 
variant that uses id() (or object.__hash__ in Jython) and "x is y" would seem to 
quite happily meet the use cases you have posted in this thread, without 
silencing currently detected bugs.

override_dict is probably just a case of compulsive overgeneralisation that can 
be ignored for the time being :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at email.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.skystorm.net

Mutable objects which define __hash__ (was Re: Why are tuples immutable?)

Mutable objects which define hash (was Re: Why are tuples immutable?)