[issue32513] dataclasses: make it easier to use user-supplied special methods

Eric V. Smith report at bugs.python.org
Fri Jan 26 20:52:18 EST 2018


Eric V. Smith <eric at trueblade.com> added the comment:

I apologize for the length of this, but I want to be as precise as
possible.  I've no doubt made some mistakes, so corrections and
discussion are welcomed.

I'm adding the commented text at the end of this message to
dataclasses.py. I'm repeating it here for discussion.  These tables
are slightly different from previous versions on this issue, and I've
added line numbers to the __hash__ table to make it easier to discuss.
So any further comments and discussion should reference the tables in
this message.

I think the __init__, __repr__, __set/delattr__, __eq__, and ordering
tables are not controversial.

** hash

For __hash__, the 99.9% use case is that the default value of
hash=None is sufficient.  This is rows 1-4 of that table.  It's
unfortunate that I have to spend so much of the following text
describing cases that I think will rarely be used, but I think it's
important that in these special cases we don't do anything surprising.

**** hash=None

First, let's discuss hash=None, which is the default.  This is lines
1-4 of the __hash__ table.

Here's an example of line 1:

@dataclass(hash=None, eq=False, frozen=False)
class A:
    i: int

The user doesn't want an __eq__, and the class is not frozen.  Whether
or not the user supplied a __hash__, no hash will be added by
@dataclass.  In the absense of __hash__, the base class (in this case,
object) __hash__.  object.__hash__ and object.__eq__ are based on
object identity.

Here's an example of line 2:

@dataclass(hash=None, eq=False, frozen=True)
class A:
    i: int

This is a frozen class, where the user doesn't want an __eq__.  The
same logic is used as for line 1: no __hash__ is added.

Here's an example of line 3, "no" column (no __hash__).  Note that
this line shows the default values for hash=, eq=, frozen=:

@dataclass(hash=None, eq=True, frozen=False)
class A:
    i: int

In this case, if the user doesn't provide __hash__ (the "no" column),
we'll set __hash__=None.  That's because it's a non-frozen class with
an __eq__, it should not be hashable.

Here's an example of the line 3, "yes" column (__hash__ defined):

@dataclass(hash=None, eq=True, frozen=False)
class A:
    i: int
    def __hash__(self): pass

Since a __hash__ exists, it will not be overwritten.  Note that this
is also true if __hash__ were set to None.  We're just checking that
__hash__ exists, not it's value.

Here's an example of line 4, "no" column:

@dataclass(hash=None, eq=True, frozen=True)
class A:
    i: int

In this case the action is "add", and a __hash__ method will be
created.

And finally, consider line 4, "yes" column.  This case is hash=None,
eq=True, frozen=True.  Here we also apply the auto-hash test, just
like we do in the hash=True case (line 12, "yes" column).  We want to
make the frozen instances hashable, so we add our __hash__ method,
unless the user has directly implemented __hash__.

**** hash=False

Next, consider hash=False (rows 5-8).  In this case, @dataclass will
never add a __hash__ method, and if one exists it is not modified.

**** hash=True

Lastly, consider hash=True.  This is rows 9-12 of the __hash__ table.
The user is saying they always want a __hash__ method on the class.
If __hash__ does not already exist (the "no" column), then @dataclass
will add a __hash__ method.  If __hash__ does already exist in the
class's __dict__, @dataclass will overwrite it if and only if the
current value of __hash__ is None, and if the class has an __eq__ in
the class definition.  Let's call this condition the "auto-hash test".
The assumption is that the only reason a class has a __hash__ of None
and an __eq__ method is that the __hash__ was automatically added by
Python's class creation machinery.  And if the hash was auto-added,
then by the user specifying hash=True, they're saying they want the
__hash__ = None overridden by a generated __hash__ method.

I'll give examples from line 9 of the __hash__ table, but this
behavior is the same for rows 9-12.  Consider:

@dataclass(hash=True, eq=False, frozen=False)
class A:
    i: int
    __hash__ = None

This is line 9, "yes" column, action="add*", which says to add a
__hash__ method if the class passes the auto-hash test.  In this case
the class fails the test because it doesn't have an __eq__ in the
class definition.  __hash__ will not be overwritten.

Now consider:

@dataclass(hash=True, eq=False, frozen=False)
class A:
    i: int
    def __eq__(self, other): ...

This again is line 9, "yes" column, action="add*".  In this case, the
user is saying to add a __hash__, but it already exists because Python
automatically added __hash__=None when it created the class.  So, the
class passes the auto-hash test.  So even though there is a __hash__,
we'll overwrite it with a generated method.

Now consider:

@dataclass(hash=True, eq=False, frozen=False)
class A:
    i: int
    def __eq__(self, other): ...
    def __hash__(self): ...

Again, this is line 9, "yes" column, action="add*". The existing
__hash__ is not None, so we don't overwrite the user's __hash__
method.

Note that a class can pass the auto-hash test but not have an
auto-generated __hash__=None.  There's no way for @dataclass to
actually know that __hash__=False was auto-generated, it just assumes
that that's the case.  For example, this class passes the auth-hash
test and __hash__ will be overwritten:

@dataclass(hash=True, eq=False, frozen=False)
class A:
    i: int
    def __eq__(self, other): ...
    __hash__=None

A special case to consider is lines 11 and 12 from the table.  Here's an
example of line 11, but line 12 is the same for the purposes of this
discussion:

@dataclass(hash=True, eq=True, frozen=False)
class A:
    i: int
    __hash__=None

The class will have a generated __eq__, because eq=True.  However, the
class still fails the auto-hash test, because the class's __dict__ did
not have an __eq__ that was added by the class definition.  Instead,
it was added by @dataclass.  So this class fails the auto-hash test
and the __hash__ value will not be overwritten.


Tables follow:

# Conditions for adding methods.  The boxes indicate what action the
#  dataclass decorator takes.  For all of these tables, when I talk
#  about init=, repr=, eq=, order=, hash=, or frozen=, I'm referring
#  to the arguments to the @dataclass decorator.  When checking if a
#  dunder method already exists, I mean check for an entry in the
#  class's __dict__.  I never check to see if an attribute is defined
#  in a base class.

# Key:
# +=========+=========================================+
# + Value   | Meaning                                 |
# +=========+=========================================+
# | <blank> | No action: no method is added.          |
# +---------+-----------------------------------------+
# | add     | Generated method is added.              |
# +---------+-----------------------------------------+
# | add*    | Generated method is added only if the   |
# |         |  existing attribute is None and if the  |
# |         |  user supplied a __eq__ method in the   |
# |         |  class definition.                      |
# +---------+-----------------------------------------+
# | raise   | TypeError is raised.                    |
# +---------+-----------------------------------------+
# | None    | Attribute is set to None.               |
# +=========+=========================================+

# __init__
#
#   +--- init= parameter
#   |
#   v     |       |       |
#         |  no   |  yes  |  <--- class has __init__ in __dict__?
# +=======+=======+=======+
# | False |       |       |
# +-------+-------+-------+
# | True  | add   |       |  <- the default
# +=======+=======+=======+

# __repr__
#
#    +--- repr= parameter
#    |
#    v    |       |       |
#         |  no   |  yes  |  <--- class has __repr__ in __dict__?
# +=======+=======+=======+
# | False |       |       |
# +-------+-------+-------+
# | True  | add   |       |  <- the default
# +=======+=======+=======+


# __setattr__
# __delattr__
#
#    +--- frozen= parameter
#    |
#    v    |       |       |
#         |  no   |  yes  |  <--- class has __setattr__ or __delattr__ in __dict__?
# +=======+=======+=======+
# | False |       |       |  <- the default
# +-------+-------+-------+
# | True  | add   | raise |
# +=======+=======+=======+
# Raise because not adding these methods would break the "frozen-ness"
#  of the class.

# __eq__
#
#    +--- eq= parameter
#    |
#    v    |       |       |
#         |  no   |  yes  |  <--- class has __eq__ in __dict__?
# +=======+=======+=======+
# | False |       |       |
# +-------+-------+-------+
# | True  | add   |       |  <- the default
# +=======+=======+=======+

# __lt__
# __le__
# __gt__
# __ge__
#
#    +--- order= parameter
#    |
#    v    |       |       |
#         |  no   |  yes  |  <--- class has any comparison method in __dict__?
# +=======+=======+=======+
# | False |       |       |  <- the default
# +-------+-------+-------+
# | True  | add   | raise |
# +=======+=======+=======+
# Raise because to allow this case would interfere with using
#  functools.total_ordering.

# __hash__

#      +------------------- hash= parameter
#      |       +----------- eq= parameter
#      |       |       +--- frozen= parameter
#      |       |       |
#      v       v       v    |        |        |
#                           |   no   |  yes   |  <--- class has __hash__ in __dict__?
# +=========+=======+=======+========+========+
# | 1 None  | False | False |        |        | No __eq__, use the base class __hash__
# +---------+-------+-------+--------+--------+
# | 2 None  | False | True  |        |        | No __eq__, use the base class __hash__
# +---------+-------+-------+--------+--------+
# | 3 None  | True  | False | None   |        | <-- the default, not hashable
# +---------+-------+-------+--------+--------+
# | 4 None  | True  | True  | add    | add*   | Frozen, so hashable
# +---------+-------+-------+--------+--------+
# | 5 False | False | False |        |        |
# +---------+-------+-------+--------+--------+
# | 6 False | False | True  |        |        |
# +---------+-------+-------+--------+--------+
# | 7 False | True  | False |        |        |
# +---------+-------+-------+--------+--------+
# | 8 False | True  | True  |        |        |
# +---------+-------+-------+--------+--------+
# | 9 True  | False | False | add    | add*   | Has no __eq__, but hashable
# +---------+-------+-------+--------+--------+
# |10 True  | False | True  | add    | add*   | Has no __eq__, but hashable
# +---------+-------+-------+--------+--------+
# |11 True  | True  | False | add    | add*   | Not frozen, but hashable
# +---------+-------+-------+--------+--------+
# |12 True  | True  | True  | add    | add*   | Frozen, so hashable
# +=========+=======+=======+========+========+
# For boxes that are blank, __hash__ is untouched and therefore
#  inherited from the base class.  If the base is object, then
#  id-based hashing is used.
# Note that a class may have already __hash__=None if it specified an
#  __eq__ method in the class body (not one that was created by
#  @dataclass).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32513>
_______________________________________


More information about the Python-bugs-list mailing list