[Python-ideas] namedtuple baseclass

Andrew Barnert abarnert at yahoo.com
Sun Jan 12 22:35:34 CET 2014


From: Steven D'Aprano <steve at pearwood.info>

Sent: Sunday, January 12, 2014 3:55 AM


> Changing namedtuple is not enough.

In fact, it's almost completely orthogonal to adding a NamedTuple ABC. Changing namedtuple shouldn't be necessary, and definitely won't be sufficient.

> So I fail to see how anything short of a massive re-engineering of not 
> just namedtuple but also any C namedtuple-like types will satisfy the 
> OP's use-case. Have I missed something?


I said pretty much the same thing yesterday… but on further reflection, I think it's a lot simpler than it looks.

First, let's write collections.abc.NamedTuple:

    class NamedTuple(Sequence):
        @classmethod
        def __subclasshook__(cls, sub):
            if not issubclass(sub, collections.abc.Sequence):
                return False
            try:
                sub._fields
                return True
            except:
                return NotImplemented

That's easy, and it works with namedtuple types with no change, and it should work with any Python wrapper type that's designed to emulate namedtuple without using it (e.g., if someone decides to write a custom implementation with a shared base class, so he can make all of his types share implementations for _make and friends, as has been suggested on this thread).

So, what about C types? Obviously they don't generally supply _fields—or anything else useful.

But most (all?) of the namedtuple-like types in builtins/stdlib are built with PyStructSequence, and adding _fields to them requires just a few lines at the end of PyStructSequence_InitType2:

    PyObject *_fields = PyTuple_New(visible_length_key); for (i=0; i!=visible_key_length; ++i) { PyObject *field = PyUnicode_FromString(desc->fields[i].name);
PyTuple_SET_ITEM(_fields, i, field);
}
PyDict_SetItemString(dict, "_fields", fields);

In fact, that might be worth doing even without the NamedTuple ABC proposal.

But StructSequence has only been an exposed, documented protocol since 3.3, so surely there are extension modules out there that do their namedtuple-like types manually. (In a quick look around, I couldn't find any examples—although I did find a couple with Python wrappers that create a namedtuple around the result returned by a C implementation function—but I'm sure they exist.)


Obviously you need to be able to get the field names from somewhere—whether that's an attribute or method on the type, copy-pasting from documentation or source, or even parsing the repr of an instance or something—but then you can just generate a wrapper from the type and its field names.

And we could just leave it at that: "Sorry, those aren't NamedTuple classes, but you can always implement a wrapper in Python yourself." Or we could add a wrapper-generator to the collections module. Something like this:

    def namedtupleize(cls, fields):

        if isinstance(fields, str):
            fields = fields.split()
        class Sub:
            _fields = fields
            def __init__(self, *args, **kwargs):
                self.values = cls(*args, **kwargs)
            def __repr__(self):
                return repr(self.values)
            # a handful of other special methods that can't be getattrified
            def __getattr__(self, attr):
                return getattr(self.values, attr)
        return Sub

    statfields = 'st_mode st_ino st_dev st_nlink st_uid st_gid st_size st_atime st_mtime st_ctime'
    Stat = namedtuplize(os.stat_result, stat fields)
    stats = (Stat(os.stat(f)) for f in os.listdir('.'))

(I'm using os.stat_result as an example, even though it's already a PyStructSequence so you wouldn't need it here, only for lack of a real-life example.)

And then you can write a wrapper around os.stat that returns a Stat instead of an os.stat_result. Or, going the other way, in a quick&dirty script that just wraps a handful of these, you can just even wrap each object:

    def namedtuplify(obj, fields):
        return namedtuplize(type(obj), fields)(obj)

While the namedtuplize function could be useful in the stdlib, the namedtuplify function is less useful, and there are many cases where it's a bad idea, and it's trivial to write yourself if you have need it, so I wouldn't add that to collections, except maybe as a recipe in the docs.

One last thing: Either the ABC or the wrapper could also add _as_odict and the other methods that can be easily derived from _fields, because they're useful, and I frequently see people doing _as_odict by calling getattr(self, field) on each field.


More information about the Python-ideas mailing list