[Python-ideas] Variable-length, homogeneous tuple: why? (was: Optional static typing -- the crossroads)

Sun Aug 17 21:31:19 CEST 2014

I think we're conflating multiple problems here.

Sometimes we use tuple to mean a homogeneous, arbitrary-length, immutable sequence, and other times we use it to mean a heterogeneous, fixed-length sequence. Nick's list demonstrates that the former is (a) common enough to worry about, and (b) not always a mistake.

But even if such uses were always a mistake, Python is obviously not going to add a frozenlist, with new display syntax, and change over all existing misuses of tuple (which would clearly require a full deprecation cycle) before adding static typing. So, we'd still need a way to statically type both uses.

And even if the homogeneous uses had never existed in the first place, a heterogeneous tuple would still not be a parametric type in the same sense as all of the other collections, the io classes, etc. So, we'd still need to distinguish it syntactically from all of the other generics.

So, arguing about whether we need to handle heterogeneous tuples specially is pointless; the only question is how we do it.

I can think of four possibilities:

1. Use a tuple of types to mean a tuple of types: (int, str).

This is exactly what we already do in isinstance, the except statement, etc. And it's how Swift, C++, D, and other languages specify a tuple of types (although Swift can also use a product).

This definitely would be potentially confusing if we went with the obiwan-inspired syntax of [str] for lists (or iterables or mutable sequences) of str, but since I think Guido has pretty conclusively argued against that syntax on other grounds, this isn't a problem.

This probably implies that function types should be written as Function[(str, int), int] instead of Function[[str, int], int], but I think MyPy already handles that, and I think it makes more sense anyway.

2. Use a product of types to mean a tuple: int*str.

Tuples really are just product types. This is how you specify them in type theory (and relational theory, and elsewhere). This is exactly how ML, Haskell, and other languages that designed their type systems carefully instead of haphazardly represent tuples of types. It also nicely parallels the suggestions for str|int for union types and str&int for multi-inherited subtypes.

The first big problem here is that there's no way to specify a tuple of one type. That's not a problem for theory-inspired languages, because in those languages, a tuple of one value is the same thing as that value, but that's obviously not true for Python. The fact that Python doesn't have an appropriate unit type (None is a value that people frequently want to use in tuples) is also at least a theoretical problem.

Also, unless we were going to change isinstance, except, etc. to accept (or require) this syntax instead of a tuple of types, I think it would be confusing to remember that you use a tuple of types in some places, a product of types in others. Unlike subscripting, this looks too similar, and too syntactic, to avoid confusion.

3. Use subscripting, but a different form of subscripting, as Steven suggested: Tuple[::int, str].

This isn't actually right; it doesn't mean Tuple[::(int, str)], but Tuple[(::int), str]. But let's assume it's possible to come up with a readable syntax that doesn't require parentheses and ignore that problem.

This implies a more general use of slicing in type subscription: the start is the type parameter, and the step is some other stuff to be used in a special way that's specific to that type. If we have other such uses, that would be a door worth leaving open, but I suspect we don't. (If we had dynamic/implicit named tuples, as people have suggested a few times, would that be relevant here?)

Also notice that this is still passing a tuple of types, it's just wrapping it up in a slice and then passing that to Tuple just so it can mark the tuple of types as actually meaning a tuple of types. Is that adding enough additional information to be worth all that additional verbosity and complexity?

4. Just use Tuple[int, str] and note that this is a special case that doesn't mean the same thing as other generic types.

It seems to me potentially very confusing to have Tuple[int] mean a homogeneous arbitrary-length immutable sequence of ints, while Tuple[int, str] means exactly one int and one str (and Tuple[int,] means exactly one int). You could argue that this confusion is inherent in Python's use of tuples for those two different cases in the first place, but we're still spreading that confusion further.

If there were no better alternatives, I think this might be better than Steven's suggestion (practicality beats purity, and his suggestion really doesn't remove that much confusion), but I think there are better alternatives—namely, the first one.

On Sunday, August 17, 2014 5:44 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
>
>On 17 August 2014 21:44, Devin Jeanpierre <jeanpierreda at gmail.com> wrote:
>> On Sun, Aug 17, 2014 at 1:23 AM, Ben Finney <ben+python at benfinney.id.au> wrote:
>>> I have encountered many uses of “homogeneous, variable-length sequence”
>>> and every time a Python tuple is used for that, I perceive a Python list
>>> would be better precisely *because* it better indicates that semantic
>>> meaning.
>>>
>>> I'd like to know how you think that's not true, and what real-world code
>>> makes you think so.
>>
>> isinstance is real world code that for the second parameter accepts
>> types and (recursively) tuples of any length of things it accepts.
>
>There are a few other cases where tuples are special cased as arguments:
>
>- str.__mod__
>- str.startswith (ditto for binary sequences)
>- str.endswith (ditto for binary sequences)
>
>>>> "aa".endswith(['a', 'b', 'c'])
>Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>TypeError: endswith first arg must be str or a tuple of str, not list
>
>Searching the C files for "tuple of" turned up a couple more:
>
>* N-dimensional indexing also relies specifically on
>tuples-of-integers, rather than arbitrary iterators.
>
>* dynamic type creation expects to receive the bases as a tuple
>
>* the decimal module uses tuples of digits for internal data representations
>
>And that inspired recollection of several other cases where mutability
>would be wrong, because the tuple represents cached information rather
>than dynamic state:
>
>* various "*args" related APIs use "tuple of object" or "tuple of
>thing" (e.g. attributes of partial objects, internal storage in
>contextlib.ExitStack when used with arbitrary callbacks)
>
>* other introspection related APIs use tuples to report information
>about inspected objects
>
>* namedtuple _fields attributes are a tuple of strings
>
>* BaseException.args publishes the full args tuple passed to the constructor
>
>str.startswith, str.endswith, isinstance and issubclass use the
>"implied or" interpretation, everything else does not. In most cases,
>the immutability conveys relevant semantic information (usually
>indicating that it's a read-only API).
>
>Cheers,
>Nick.
>
>-- 
>Nick Coghlan   |  ncoghlan at gmail.com   |   Brisbane, Australia
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140817/c9848854/attachment-0001.html>