Differences creating tuples and collections.namedtuples

Terry Reedy tjreedy at udel.edu
Mon Feb 18 16:28:43 EST 2013


On 2/18/2013 6:47 AM, John Reid wrote:

> I was hoping namedtuples could be used as replacements for tuples
 >  in all instances.

This is a mistake in the following two senses. First, tuple is a class 
with instances while namedtuple is a class factory that produces 
classes. (One could think of namedtuple as a metaclass, but it was not 
implemented that way.) Second, a tuple instance can have any length and 
different instances can have different lengths. On the other hand, all 
instances of a particular namedtuple class have a fixed length. This 
affects their initialization. So does the fact that Oscar mentioned, 
that fields can be initialized by name.

 > There seem to be some differences between how tuples and namedtuples
 > are created. For example with a tuple I can do:
>
> a=tuple([1,2,3])

But no sensible person would ever do that, since it creates an 
unnecessary list and is equivalent to

a = 1,2,3

The actual api is tuple(iterable). I presume you know that, but it gets 
to the question you ask about 'why the difference?'. The only reason to 
use an explicit tuple() call is when you already have an iterable, 
possibly of unknown length, rather than the individual field objects. In 
the latter case, one should use a display.

> with namedtuples I get a TypeError:
>
> from collections import namedtuple
> B=namedtuple('B', 'x y z')
> b=B([1,2,3])

There is no namedtuple B display, so one *must* use an explicit call 
with the proper number of args. The simplest possibility is B(val0, 
val1, val2). Initializaing a namedtuple from an iterable is unusual, and 
hence gets the longer syntax. In other words, the typical use case for a 
namedtuple class is to replace statements that have tuple display.

     return a, b, c
to
     return B(a, b, c)

or
     x = a, b, c
to
     x = B(a, b, c)

It is much less common to change tuple(iterable) to B(iterable).

> def canSequence(obj):
>      if isinstance(obj, (list, tuple)):
>          t = type(obj)
>          return t([can(i) for i in obj])
>      else:
>          return obj

The first return could also be written t(map(can, obj)) or, in Python 3,
t(can(i) for i in obj).

> where obj is a namedtuple and t([can(i) for i in obj]) fails with the TypeError. See http://article.gmane.org/gmane.comp.python.ipython.user/10270 for more info.
>
> Is this a problem with namedtuples, ipython or just a feature?

With canSequence. If isinstance was available and the above were written 
before list and tuple could be subclassed, canSequence was sensible when 
written. But as Oscar said, it is now a mistake for canSequence to 
assume that all subclasses of list and tuple have the same 
initialization api.

In fact, one reason to subclass a class is to change the initialization 
api. For instance, before python 3, range was a function that returned a 
list. If lists had always been able to be subclasses, it might instead 
have been written as a list subclass that attached the start, stop, and 
step values, like so:

# python 3
class rangelist(list):
      def __init__(self, *args):
           r = range(*args)
           self.extend(r)
           self.start = r.start
           self.stop = r.stop
           self.step = r.step

r10 = rangelist(10)
print(r10, r10.start, r10.stop, r10.step)
 >>>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0 10 1

However, define can() and canSequence(r10) will raise a TypeError, just 
as with a namedtuple B instance.

TypeError: 'list' object cannot be interpreted as an integer

So, while your question about the namedtuple api is a legitimate one, 
your problem with canSequence is not really about namedtuples, but about 
canSequence making a bad assumption.

-- 
Terry Jan Reedy




More information about the Python-list mailing list