Differences creating tuples and collections.namedtuples

Mon Feb 18 19:18:30 EST 2013

Terry Reedy wrote:

> On 2/18/2013 6:47 AM, John Reid wrote:
> 
>> I was hoping namedtuples could be used as replacements for tuples
>  >  in all instances.
> 
> This is a mistake in the following two senses. First, tuple is a class
> with instances while namedtuple is a class factory that produces
> classes. (One could think of namedtuple as a metaclass, but it was not
> implemented that way.) 

I think you have misunderstood. I don't believe that John wants to use the
namedtuple factory instead of tuple. He wants to use a namedtuple type
instead of tuple.

That is, given:

Point3D = namedtuple('Point3D', 'x y z')

he wants to use a Point3D instead of a tuple. Since:

issubclass(Point3D, tuple) 

holds true, the Liskov Substitution Principle (LSP) tells us that anything
that is true for a tuple should also be true for a Point3D. That is, given
that instance x might be either a builtin tuple or a Point3D, all of the
following hold:

- isinstance(x, tuple) returns True
- len(x) returns the length of x
- hash(x) returns the hash of x
- x[i] returns item i of x, or raises IndexError
- del x[i] raises TypeError
- x + a_tuple returns a new tuple
- x.count(y) returns the number of items equal to y

etc. Basically, any code expecting a tuple should continue to work if you
pass it a Point3D instead (or any other namedtuple).

There is one conspicuous exception to this: the constructor:

type(x)(args)

behaves differently depending on whether x is a builtin tuple, or a Point3D.

The LSP is about *interfaces* and the contracts we make about those
interfaces, rather than directly about inheritance. Inheritance is just a
mechanism for allowing types to automatically get the same interface as
another type. Another way to put this, LSP is about duck-typing. In this
case, if we have two instances:

x = (1, 2, 3)
y = Point3D(4, 5, 6)

then x and y:

- quack like tuples
- swim like tuples
- fly like tuples
- walk like tuples
- eat the same things as tuples
- taste very nice cooked with orange sauce like tuples

etc., but y does not lay eggs like x. The x constructor requires a single
argument, the y constructor requires multiple arguments.

You can read more about LSP here:

http://en.wikipedia.org/wiki/Liskov_substitution_principle

although I don't think this is the most readable Wikipedia article, and the
discussion of mutability is a red-herring. Or you can try this:

http://c2.com/cgi/wiki?LiskovSubstitutionPrinciple

although even by c2 wiki standards, it's a bit of a mess. These might help
more:

http://blog.thecodewhisperer.com/2013/01/08/liskov-substitution-principle-demystified/

http://lassala.net/2010/11/04/a-good-example-of-liskov-substitution-principle/

> Second, a tuple instance can have any length and 
> different instances can have different lengths. On the other hand, all
> instances of a particular namedtuple class have a fixed length.

This is a subtle point. If your contract is, "I must be able to construct an
instance with a variable number of items", then namedtuples are not
substitutable for builtin tuples. But I think this is an *acceptable*
violation of LSP, since we're deliberately restricting a namedtuple to a
fixed length. But within the constraints of that fixed length, we should be
able to substitute a namedtuple for any tuple of that same length.

> This 
> affects their initialization. So does the fact that Oscar mentioned,
> that fields can be initialized by name.

Constructing namedtuples by name is not a violation, since it *adds*
behaviour, it doesn't take it away. If you expect a tuple, you cannot
construct it with:

t = tuple(spam=a, ham=b, eggs=c)

since that doesn't work. You have to construct it from an iterable, or more
likely a literal:

t = (a, b, c)

Literals are special, since they are a property of the *interpreter*, not
the tuple type. To put it another way, the interpreter understands (a,b,c)
as syntax for constructing a tuple, the tuple type does not. So we cannot
expect to use (a,b,c) syntax to construct a MyTuple instance, or a Point3D
instance instead.

If we hope to substitute a subclass, we have to use the tuple constructor
directly:

type_to_use = tuple
t = type_to_use([a, b, c])

Duck-typing, and the LSP, tells us that we should be able to substitute a
Point3D for this:

type_to_use = namedtuple('Point3D', 'x y z')
t = type_to_use([a, b, c])

but we can't. And that is an important violation of LSP.

There could be three fixes to this, none of them practical:

1) tuple could accept multiple arguments, tuple(a, b, c) => (a, b, c) but
that conflicts with the use tuple(iterable). If Python had * argument
unpacking way back in early days, it might have been better to give tuples
the signature tuple(*args), but it didn't and so it doesn't and we can't
change that now.

2) namedtuples could accept a single iterable argument like tuple does, but
that conflicts with the desired signature pt = Point3D(1, 2, 3).

3) namedtuples should not claim to be tuples, which is probably the
least-worst fix. Backwards-compatibility rules out making this change, but
even if it didn't, namedtuples quack like tuples, swim like tuples, and
walk like tuples, so even if they aren't a subclass of tuple it would still
be reasonable to want them to lay eggs like tuples.

So I don't believe there is any good solution to this, except the ad-hoc one
of overriding the __new__ constructor when needed.

>  > There seem to be some differences between how tuples and namedtuples
>  > are created. For example with a tuple I can do:
>>
>> a=tuple([1,2,3])
> 
> But no sensible person would ever do that, since it creates an
> unnecessary list and is equivalent to
> 
> a = 1,2,3

Well, no, not as given. But it should be read as just an illustration. In
practise, code like this is not uncommon:

a = tuple(some_iterable)

[...]
> It is much less common to change tuple(iterable) to B(iterable).

Less common or not, duck-typing and the LSP tells us we should be able to do
so. We cannot.

>> Is this a problem with namedtuples, ipython or just a feature?
> 
> With canSequence. If isinstance was available and the above were written
> before list and tuple could be subclassed, canSequence was sensible when
> written. But as Oscar said, it is now a mistake for canSequence to
> assume that all subclasses of list and tuple have the same
> initialization api.

No, it is not a mistake. It is a problem with namedtuples that they violate
the expectation that they should have the same constructor signature as
other tuples. After all, namedtuples *are* tuples, they should be
constructed the same way. But they aren't, so that violates a reasonable
expectation.

Is the convenience of being able to write Point3D(1, 2, 3) more important
than LSP-purity? Perhaps. I suspect that will be the answer Raymond
Hettinger might give. I'm 85% inclined to agree with this answer.

> In fact, one reason to subclass a class is to change the initialization
> api.

That might be a reason that people give, but it's a bad reason from the
perspective of interface contracts, duck-typing and the LSP.

Of course, these are not the *only* perspectives. There is no rule that
states that one must always obey the interface contracts of one's parent
class. But if you don't, you will be considered an "ill-behaved" subclass
for violating the promises made by your type.

-- 
Steven