[Python-ideas] add an additional dataclasses.asdict option for non-dataclasses

Sun May 12 12:49:51 EDT 2019

On Sat, May 11, 2019 at 8:23 PM Eric V. Smith <eric at trueblade.com> wrote:

>
>> https://www.python.org/dev/peps/pep-0557/#why-not-just-use-namedtuple
>>
>
> you would know, but that reference talks about why they are not the same
> as NamedTuple.
>
>
> That section mentions why they’re not iterable. Search on “iterable”.
>

Sure, but in the context of the difference with NamedTuple:

"""
Instances are always iterable, which can make it difficult to add fields.
If a library defines:

Time = namedtuple('Time', ['hour', 'minute'])
def get_time():
    return Time(12, 0)

Then if a user uses this code as:

hour, minute = get_time()

then it would not be possible to add a second field to Time without
breaking the user's code.
"""

which is a great argument for why you wouldn't want dataclasses to iterate
on the *values* of the fields, which is what NamedTuple does. In fact,
beyond iteration, NamedTuple, IS a tuple, so:

- the number of "fields" matters,
- the order of the "fields" matters,
- it's the values that really matter -- the field names are incidental.

None of this (intentionally) applies to dataclasses - which is what i took
as the point of that section of the PEP.

But I see dataclasses as being more analogous to a dict than a tuple:

- The field names matter
- The order of the fields does not matter (the current state of the dict
implementation not withstanding)

So *if* dataclasses were to be iterable, then I'd think they should either
implement the Mapping protocol (which would make them, similar to
NamedTuple, a drop-in replacement for a dict, or iterate over (field_name,
value) pairs. which would not introduce the issues staed in teh PEP with
the iterability of NamedTuple.

Note: I am NOT advocating for making dataclasses a Mapping -- I believe in
a clear separation between "data" and "code" -- dicts are suitable for
data, and dataclasses are suitable for code. Given PYthon's dynamic nature,
this isn't always a clear boundary, but I think it's a better not to
purposely make it even less clear.

Others may have a different take on this -- it could be kind of a cool way
to make a dict with extra functionality, but I think that's ill-advised.

> if dataclasses *were*iterable, they almost certainly wouldn't iterate over
> the values alone.
>
> That wouldn’t make a difference. The given NT example would still be a
> problem.
>

you know, I'm embarrassed to say that I read through a lot of this thread
wondering what the heck an NT is ;-) -- but now I got it.

but I don't think the NT example is relevant here -- it's an issue with NT,
because NTs are designed to be, and used as drop in replacements of tuples,
and a lot of code out there does use tuples for "unpacking" assignments.

But I'm going to hazard a guess that it's pretty rare for folks to do:

def get_time():
    ...
    return {"hour", 3, "minute", 5}

hour, minute = get_time()

and:

result = get_time()
hour = result['hour']
minute = results["minute"]

wouldn't break, now would:

time_dict = dict(get_time())

Anyway, we need dataclasses AND NamedTuples because dataclasses down't
behave like a tuple -- but that doesn't mean that they couldn't behave more
like a dict in some ways.

> And my toy code actually adds another decorator to make dataclasses
> iterable, so it would be a completely optional feature.
>
>
> That approach of adding support for iterability makes sense to me. I’m
> contemplating adding a “dataclasses_tools” package for some other tools I
> have.
>

I like that -- good idea. If any of my toy code is useful, please feel free
to make use of it.

> But I’m not sure how this fits in to the asdict discussion.
>

Well, it fits in in the sense that the question at hand is:

Is there a need for a standard protocol for making dicts out of arbitrary
classes?

datacalsses were used as an example use-case, as datacalsses do naturally
map to dicts.

My point is that there are already two protocols for making an arbitrary
classes into a dict, so I don't think we need another one. Even though
those two are overloading other use cases: the Mapping ABC and iteration.

This does mean that there would be no way to shoehorn NamedTuples into this
-- as they already use the iteration protocol in a way that is incompatible
with dict creation. So yeah, maybe a use case there.

But the other option is to use a convention, rather than an official
protocol -- if an object has an "asdict()" method, then that method returns
a dict. pretty simple.

If you were dealing with arbitrary objects, you'd have to write code like:

try:
    a_dict = an_object.asdict()
except (AttributeError, Type Error):
    a_dict = something_else

but you'd need to do that anyway, as you couldn't count on every type being
conformant with the dict() constructor anyway.

So I'm still trying to see the use cases for having wanting to turn
arbitrary objects into dicts.

As for NamedTuple -- it does have _asdict:

In [122]: p._asdict()

Out[122]: OrderedDict([('x', 4), ('y', 5)])

(sidenote --now that dicts preserve order, maybe NamedTuple could return a
regular dict)

So I see the OP's point -- and it's a leading dunder name because:

"""
To prevent conflicts with field names, the method and attribute names start
with an underscore.
"""

which I suppose means that if there is an informal protocol, it should also
use that leading underscore.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190512/ad2d5061/attachment.html>