[Python-ideas] recorarray: a mutable alternative to namedtuple

Sat Mar 28 19:57:46 CET 2015

On Mar 28, 2015, at 06:37, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>> On Fri, Mar 27, 2015 at 04:13:46PM -0700, Andrew Barnert wrote:
>> On Mar 27, 2015, at 06:22, Joao S. O. Bueno <jsbueno at python.org.br> wrote:
> [...]
>>> The Python equivalent of a C Struct.
>> 
>> But a C struct is not a subtype of, or substitutable for, a C array. 
>> It's not indexable. And the same is true with the equivalents in other 
>> languages. In fact, the dichotomy between struct--heterogeneous 
>> collection of fixed named fields--and array--homogeneous collection of 
>> indexed fields--goes back way before C. So, if you want the equivalent 
>> of a C struct, there's no reason to make it an iterable in Python.
> 
> Joao said "The Python equivalent of a C struct", not "a C struct". 
> Python is not C, and Python data types are not limited to what C does. 
> Python strings aren't limited to C null-delimited strings, and Python 
> ints aren't limited to what C ints can do.

Sure, but nobody just invents random new features to add to int and then justifies them by saying "I want the equivalent of a C int" even though C int doesn't have those features. People invent features (like bit_length) to solve actual use cases, and justify them based on those use cases.

Multiple people have asked "what do you want this for?", and the best answer anyone's given has been "the equivalent of a C struct". (That, and to prematurely and badly optimize memory usage.)

Even worse, when I ask why specifically anyone wants this thing to be iterable, the answer is "to be the equivalent of a C struct", and that doesn't answer the question.

[snip]

>> And a class already is the Python of a C struct, it's just that it can 
>> do _more_ than a C struct.
> 
> This is why it is unfair to insist that a Python equivalent of a C 
> struct be limited to what C structs do.

I'm not saying a record type shouldn't be allowed to have any features that C structs don't, just that equivalency with C structs isn't an argument for features that C structs don't have.

Some of the extra features are so obviously desirable that they probably don't need any argument--if you're going to build this thing, having a nice repr or not breaking pickle seems hard to argue against. But iterability is not that kind of obvious win.

Also, how is it "unfair" to suggest that this thing should be limited in some ways? For example, two instances of the same class can have completely different fields; presumably two instances of the same record type really shouldn't. There's no reason it _couldn't_ be completely open like a general class, it's just that you usually don't want it to be. Similarly, there's no reason it couldn't be a sequence, but I don't think you usually want it to be.

[snip]

> Which brings us back to where this thread started: a request for a 
> mutable version of namedtuple. That's trickier than namedtuple, because 
> we don't have a mutable version of a tuple to inherit Sent from my iPhone
> do the job, because they have a whole lot of functionality that are 
> inappropriate, e.g. sort, reverse, pop methods.
> 
> That makes it harder to create a mutable structured record type, not 
> simpler.

Sure. And that's the problem. If you want something that's "just like a sequence whose elements can be replaced but whose shape is fixed, except that the elements are also named", you run into the problem that Python doesn't have such a sequence type. It's a perfectly coherent concept, and there's no reason you could design a language around immutable, fixed-shape-mutable, and mutable-shape sequences instead of just the first and last, but that's not the way Python was designed. Should that be changed? Or is the only use for such a type to underlie this new type?

> Think about the functional requirements:
> 
> - it should be semantically a struct, not a list or array;
> 
> - with a fixed set of named fields;
> 
> - fields should be ordered: a record with fields foo and bar is not the 
> same as a record with fields bar and foo;

Note that namedtuples are nominatively typed, not structurally--a record with fields foo and bar is not necessarily the same as another record with fields foo and bar. Ordering doesn't enter into it; they were defined separately, so they're separate types. Do you want the same behavior here, or the behavior your description implies instead?

> - accessing fields by index would be a Nice To Have, but not essential;

Why would that be nice to have? The record/sequence dichotomy has been fundamental to the design of languages since the earliest days, and it's still there in almost all languages. Maybe every language in the world is wrong--but if so, surely you can explain why?

For structseq, there was a good reason: a stat result is a 7-tuple as well as being a record with 13-odd fields, because there was a pre-existing mass of code that used stat results as 7-tuples, but people also wanted to be able to access the newer or not-100%-portable fields. That's a great use case. And people have used structseq in other similar examples to migrate users painlessly from an early API that turned out to be too simple and limited. And namedtuple gives you a way to write APIs in a similar style that previously could only be (easily) written with a C extension, which is an obvious win. That's clearly not the case here--nobody has existing APIs that use a fixed-length but mutable sequence that they want to expand into something more flexible, because Python doesn't come with such a sequence type.

Of course that's not the only use anyone's ever found for, respectively, structseq and namedtuple--e.g., converting to namedtuple turns out to be handy for cases where you want a record but some external API like SQL gives you a sequence, and that would probably be a good enough justification for namedtuple too. But what is the use that justifies this addition? (For example, if you need to take SQL rows as a sequence, mutate them by name, and then do something I can't imagine with them that requires them to still be a sequence, that would be a pretty good answer.)

> - but iteration is essential, for sequence unpacking;

Again, why is that essential?

TOOWTDI isn't an iron-clad rule, but it's a good rule of thumb; adding a second way to access the members of a record that's both unique to Python and less Pythonic seems like a bad idea, unless there's some good reason that overbalances it in the other direction. Think of stat code: it's a lot more readable when you access the fields by name instead of by unpacking. Why wouldn't the same be true for, say, a Person record, or an Address record, or an ImageHeader record, or almost anything else you can imagine?

(I can think of one particular special case where it might be nice: small, homogenous, essentially-sequence-like records like a Vector or Point or... Well, really just a Vector or Point. And they're clearly special. Both in C and in Python, you're often torn between storing them as an array or as a sequence, and you'll find different apps doing it each way. That isn't true for a Person or Address etc.)

> - values in the fields must be mutable;
> 
> - it should support equality, but not hashing (since it is mutable);
> 
> - it must have a nice repr and/or str;
> 
> - being mutable, it may directly or indirectly contain a reference to 
> itself (e.g. x.field = x) so it needs to deal with that correctly;
> 
> - support for pickle;
> 
> - like namedtuple, it may benefit from a handful of methods such as 
> '_asdict', '_fields', '_make', '_replace' or similar.
> 
> 
> Does this sound easy to write? Well, sure, in the big picture, it's 
> hardly a 100,000 line application. But it's not a trivial class.
> 
> 
> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/