[Python-Dev] Dataclasses and correct hashability
Steven D'Aprano
steve at pearwood.info
Tue Feb 6 12:26:29 EST 2018
On Mon, Feb 05, 2018 at 10:50:21AM -0800, David Mertz wrote:
> Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.
(I don't mean to pick on David specifically, I had to reply to some
message in this thread and I just picked his.)
I'm rather gobsmacked at the attitudes of many people here about hashing
data classes. I thought *I* was the cynical pessimist who didn't have a
high opinion of the quality of the average programmer, but according to
this thread apparently I'm positively Pollyanna-esque for believing that
most people will realise that if an API offers separate switches for
hashable and frozen, you need to set *both* if you want both.
Greg Smith even says that writing dunders apart from __init__ is a code
smell, and warns people not to write dunders. Seriously? I get that
__hash__ is hard to write correctly, which is why we have a hash=True to
do the hard work for us, but I can't help feeling that at the point
we're saying "don't write dunders, any dunder, you'll only do it wrong"
we have crossed over to the wrong side of the pessimist/optimist line.
But here we are: talking about naming a perfectly reasonable argument
"unsafe_hash". Why are we trying to frighten people?
There is nothing unsafe about a DataClass with hash=True, frozen=True,
but this scheme means that even people who know what they're doing will
write unsafe_hash=True, frozen=True, as if hashability was some sort of
hand grenade waiting to go off.
Perhaps we ought to deprecate __hash__ and start calling it
__danger_danger_hash__ too? No, I don't think so.
In the past, we've (rightly!) rejected proposals to call things like
eval "unsafe_eval", and that really is dangerously unsafe when used
naively with untrusted, unsanitised data. Hashing mutable objects by
accident might be annoyingly difficult and frustrating to debug, but
code injection attacks can lead to identity theft and worse, serious
consequences for real people.
I'm 100% in favour of programmer education, but I think this label is
*miseducation*. We're suggesting that hashability is unsafe, regardless
of whether the object is frozen or not.
I'd far prefer to get a runtime warning:
"Are you sure you want hash=True without frozen=True?"
(or words to that extent) rather than burden all uses of the hash
parameter, good and bad, with the unsafe label.
--
Steve
More information about the Python-Dev
mailing list