Differences creating tuples and collections.namedtuples

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Feb 18 22:06:24 EST 2013


Oscar Benjamin wrote:

> On 19 February 2013 00:18, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Terry Reedy wrote:
>>> On 2/18/2013 6:47 AM, John Reid wrote:
> [snip]
>>>> Is this a problem with namedtuples, ipython or just a feature?
>>>
>>> With canSequence. If isinstance was available and the above were written
>>> before list and tuple could be subclassed, canSequence was sensible when
>>> written. But as Oscar said, it is now a mistake for canSequence to
>>> assume that all subclasses of list and tuple have the same
>>> initialization api.
>>
>> No, it is not a mistake. It is a problem with namedtuples that they
>> violate the expectation that they should have the same constructor
>> signature as other tuples. After all, namedtuples *are* tuples, they
>> should be constructed the same way. But they aren't, so that violates a
>> reasonable expectation.
> 
> It is a mistake. A namedtuple class instance provides all of the
> methods/operators provided by a tuple. This should be sufficient to
> fill the tuplishness contract.

"Should be", but *doesn't*. 

If your code expects a tuple, then it should work with a tuple. Namedtuples
are tuples, but they don't work where builtin tuples work, because their
__new__ method has a different signature.

I can understand arguing that this is "acceptable breakage" for various
reasons -- practicality beats purity. I can't understand arguing that the
principle is wrong.


> Requiring that obj satisfy a contract 
> is one thing. When you get to the point of requiring that type(obj)
> must do so as well you have gone beyond duck-typing and the normal
> bounds of poly-morphism.

Constructor contracts are no less important than other contracts. I'm going
to give what I hope is an example that is *so obvious* that nobody will
disagree.

Consider the dict constructor dict.fromkeys:

py> mydict = {'a':1}
py> mydict.fromkeys(['ham', 'spam', 'eggs'])
{'eggs': None, 'ham': None, 'spam': None}


Now I subclass dict:

py> class MyDict(dict):
...     @classmethod
...     def fromkeys(cls, func):
...         # Expects a callback function that gets called with no arguments
...         # and returns two items, a list of keys and a default value.
...         return super(MyDict, cls).fromkeys(*func())
...

Why would I change the syntax like that? Because reasons. Good or bad,
what's done is done and there is my subclass. Here is an instance:

py> mydict = MyDict({'a': 1})
py> isinstance(mydict, dict)
True


Great! So I pass mydict to a function that expects a dict. This ought to
work, because mydict *is* a dict. It duck-types as a dict, isinstance
agrees it is a dict. What could possibly go wrong?

What goes wrong is that some day I pass it to a function that calls
mydict.fromkeys in the usual fashion, and it blows up.

py> mydict.fromkeys(['spam', 'ham', 'eggs'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in fromkeys
TypeError: 'list' object is not callable

How is this possible? Is mydict not a dict? It should be usable anywhere I
can use a dict. How is this possibly acceptable behaviour for something
which claims to be a dict?

This is a violation of the Liskov Substitution Principle, and a violation of
normal expectations that if mydict quacks like a dict, it should lay eggs
like a duck.

That namedtuple's constructor is __new__ rather than fromkeys is an
irrelevant distraction. The principle still applies. It is perfectly
reasonable to expect that if instance t is a tuple, then *any* method on t
should have the same signature, regardless of whether that method is
called "index", "__getitem__", or "__new__".

If this fundamental principle is violated, there should be a very good
reason, and not just because "constructor contracts aren't important".


> It's still unclear what the purpose of canSequence is, but I doubt
> that there isn't a better way that it (and its related functions)
> could be implemented that would not have this kind of problem.

Incorrect. The problem is with *namedtuples*, not canSequence, because
namedtuples promise to implement a strict superset of the behaviour of
builtin tuples, while in fact they actually *take behaviour away*. Tuples
promise to allow calls to the constructor like this:

any_tuple.__new__(type(any_typle), iterable))

but that fails if any_tuple is a namedtuple.

I am not arguing for or against the idea that this is an *acceptable*
breakage, give other requirements. But from the point of view of interface
contracts, it is a breakage, and as the Original Poster discovered, it can
and will break code.



-- 
Steven




More information about the Python-list mailing list