Mutable objects which define __hash__ (was Re: Why are tuples immutable?)

Nick Coghlan ncoghlan at iinet.net.au
Sat Dec 25 21:17:42 EST 2004


Bengt Richter wrote:
> I take it you are referring to regular python dictionaries,

Correct. So I'll skip over the sections talking about alternate lookup 
strategies in my reply.

>>Anyway, what are the consequences of a type which has a 'variable hash'?
>>
>>The only real bad consequence is that the internal state of a dictionary can 
>>break if someone alters the hash of one of its keys. The restriction to 
>>'constant hashes' is aimed squarely at eliminating this class of programming 
>>errors. It doesn't actually provide any performance gain.
> 
> Why not? It means you only have to compute the hash for any given object once,
> and you can cache the hash value. That could be a biggie if you had megabyte trees
> of nested tuples as keys and reused them frequently for lookup.

Allowing mutation doesn't mean abandoning your caching - it just means you need 
a way to tell the dict to update it's key cache (i.e. rehash())

> I think, for one thing, it would be more efficient to notice that some keys were guaranteed not
> to need rehashing... um, maybe by paritioning the key set into immutables and mutables ;-)

True, since otherwise rehash() would have to check every key, even the immutable 
ones. However, that only affects the time required for a rehash(), rather than 
general dictionary operation.

> For another, it's not just a matter of "moving" keys that hash "incorrectly". If keys that
> are a part of the state of the dict can mutate outside of dict methods, then keys can mutate
> to equal each other, and "rehashing" will give equal hashes for previously unequal and distinct keys,
> and you must choose which of the previous values is to be the value for the two keys that have now
> become one by equality (however __eq__, __cmp__, __coerce__ and inheritance may define equality).

Or resist the temptation to guess, and have rehash() raise an exception :)

> OTOH, if KeyError is frequent because of mutation, hashing may be next to useless overhead. IOW, DYFR ;-)

Indeed. I defined my FR based on Antoon's analogy of the sorted list - have the 
dict behave like sorted list, so that if you screw the invariant, it's the 
programmer's job to tell the dictionary to fix it.

>>>The problem is also that you really can't make new immutable objects
> 
> I don't understand that one. What do you mean by "new immutable object"?

Bad terminology on Antoon's part, I think - I believe he meant "new immutable type".

I gave the Python 2.4's Decimal as an example of something which is 
theoretically immutable, but doesn't actually enforce that immutability properly.

Overriding __setattr__ can give true immutability. It just makes the class a 
serious pain to write :)

> That is so different that I don't know how it could be used to debug an app that
> uses a mutable key dict. I think Antoon was just asking for a key-mutation detector.
> I think that's reasonable goal. A deep copy of all keys defined would permit that,
> at least when dict methods had control. But you couldn't find the code that mutated
> the originals without something outside the dict to catch it in the act.

The identity keyed dict was based on Antoon's posted use case: he wanted to 
classify mutable objects, and then check for membership in those groups.

You can use lists for this, but the lookup is slow. Conventional dictionaries 
and sets give a fast lookup, but require that the items in use be hashable.

An identity dictionary (or set) solves the 'object classification' problem 
perfectly.

> You just need to DYFR ;-)

I really do like this acronym :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at email.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.skystorm.net



More information about the Python-list mailing list