[Python-Dev] Dataclasses and correct hashability
Eric V. Smith
eric at trueblade.com
Fri Feb 2 10:08:43 EST 2018
On 2/2/2018 12:33 AM, Nick Coghlan wrote:
> For 3.7, I think we should seriously considered just straight up
> disallowing the "hash=True, frozen=False" combination, and instead
> require folks to provide their own hash function in that case.
> "Accidentally hashable" (whether by identity or field hash) isn't a
> thing that data classes should be allowing to happen.
>
> If we did that, then the public "hash" parameter could potentially be
> dropped entirely for the time being - the replacement for "hash=True"
> would be a "def __hash__: ..." in the body of the class definition,
> and the replacement for "hash=False" would be "__hash__ = None" in the
> class body.
attrs has the same behavior (if you ignore how dataclasses handles the
cases where __hash__ or __eq__ already exist in the class definition).
Here's what attrs says about adding __hash__ via hash=True:
"Although not recommended, you can decide for yourself and force attrs
to create one (e.g. if the class is immutable even though you didn’t
freeze it programmatically) by passing True or not. Both of these cases
are rather special and should be used carefully."
The problem with dropping hash=True is: how would you write __hash__
yourself? It seems like a bug magnet if you're adding fields to the
class and forget to update __hash__, especially in the presence of
per-field hash=False and eq=False settings. And you'd need to make sure
it matches the generated __eq__ (if 2 objects are equal, they need to
have the same hash value).
If we're going to start disallowing things, how about the per-field
hash=True, eq=False case?
However, I don't feel very strongly about this. As I've said, I expect
the use cases for hash=True to be very, very rare. And now that we allow
overriding __hash__ in the class body without setting hash=False, there
aren't a lot of uses for hash=False, either. But we would need to think
through how you'd get the behavior of hash=False with multiple
inheritance, if that's what you wanted. Again, a very, very rare case.
In all, I think we're better off documenting best practices and making
them the default, like attrs does, and leave it to the programmer to
follow them. I realize we're handing out footguns, the alternatives seem
even more complex and are limiting.
Eric
More information about the Python-Dev
mailing list