Why is dictionary.keys() a list and not a set?

Mike Meyer mwm at mired.org
Sat Nov 26 07:34:13 EST 2005


Christoph Zwerschke <cito at online.de> writes:
> Mike Meyer wrote:
>  > I think you're trying to tweak the wrong definition. Types are
>  > immutable if their instances are immutable.
> I'm not trying to tweak it, but to gain more clarity about what is
> actually meant when you talk about "mutable" types and objects and
> whether it suffices to use these terms as long as you speak about
> built-in datatypes.

I think we both agree that mutability isn't adequate, because it's
really irrelevant to the question. What it seems to provide is a
convenient and usually well-understood way of describing which builtin
types can be used: "Mutable builtin types cannot be used as dictionary
keys."  "Immutable builtin types can be used as dictionary keys,
except for certain tuples." That exception creates headaches. Then you
have to ponder things like "Is a file instance immutable?" because
they are hashable. Maybe we should just skip the idea completely.

>  > Do you think it's really necessary to specify "has a __hash__ method
>  > that returns a value", as opposed to just "has a __hash__ method"?
> I think it is necessary to require that it has a "proper" __hash__
> function. As far as I understand you can always add a __hash__
> method. Not only invalid __hash__ methods like in this case:
> class mylist0(list):
>      def __hash__(self): return None
> But you can also easily add valid __hash__ methods:
>
> class mylist1(list):
>      def __hash__(self): return 0815
>
> class mylist2(list):
>      def __hash__(self): return id(self)
>
> In the case of mylist1, everything is ok including semantics, but
> performance suffers dramatically. In mylist2, performance is great,
> but semantics suffers greatly.
> Which of these user-defined types would you call "hashable"?

The latter two, as the given __hash__ meets the specifications for
__hash__.  __hash__ itself isn't enough to decide the question of
semantics. You can make mylist2 have the right semantics by adding
    def __eq__(self, other): return self is other
to it.

> Technically speaking, both probably are, so you can indeed use list
> objects of both types as keys of a dictionary - but I think there
> should be a hint in the docs that you need to have a "proper" hash
> function if you want your dictionary to be performant and show the
> usually expected behavior.

I agree - but where does that hint belong? I think it belongs in the
documentation for __hash__, not the documentation for sets and
dictionaries.

How about we document it in the spirit of Python's "try it and see"
philosophy:

Any hashable object can be used as a dictionary key/set element. An
object is hashable if calling hash on that object returns an integer. 

That's 100% correct. However, is it to little information to be
useful? If so, then going back to the mutable/immutable thing seems to
be right:

Any hashable object can be used as a dictionary key/set
element. Instances of mutable builtins are not hashable. Instances of
immutable builtins are hashable, except for tuples that contain a
non-hashable value. Instances of classes are usually hashable(*).

Footnote *: An instance of a class is hashable if it has a __hash__
method, or does not have either a __cmp__ or __eq__ method. If you're
not sure, call hash() on the instance, and if it returns an integer,
the instance is hashable.

Either way, over in the documentation for __hash__, add a note that:

For instances of a class to work properly with builtin container
types, two instances that compare as equal must have the same hash
value. For best performance, hash values should be different as often
as possible while honoring the equality requirement. (or is that
wrong?)

    <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list