Pythonification of the asterisk-based collection packing/unpacking syntax

Eelco hoogendoorn.eelco at gmail.com
Sat Dec 17 09:38:22 EST 2011


This is a follow-up discussion on my earlier PEP-suggestion. Ive
integrated the insights collected during the previous discussion, and
tried to regroup my arguments for a second round of feedback. Thanks
to everybody who gave useful feedback the last time.

PEP Proposal: Pythonification of the asterisk-based collection packing/
unpacking syntax.

This proposal intends to expand upon the currently existing collection
packing and unpacking syntax. Thereby we mean the following related
python constructs:
    head, *tail = somesequence
    #pack the remainder of the unpacking of somesequence into a list
called tail
    def foo(*args): pass
    #pack the unbound positional arguments into a tuple calls args
    def foo(**kwargs): pass
    #pack the unbound keyword arguments into a dict calls kwargs
    foo(*args)
    #unpack the sequence args into positional arguments
    foo(**kwargs)
    #unpack the mapping kwargs into keyword arguments

We suggest that these constructs have the following shortcomings that
could be remedied.
It is unnecessarily cryptic, and out of line with Pythons preference
for an explicit syntax. One can not state in a single line what the
asterisk operator does; this is highly context dependent, and is
devoid of that ‘for line in file’ pythonic obviousness. From the
perspective of a Python outsider, the only hint as to what *args means
is by loose analogy with the C-way of handling variable arguments.
The current syntax, in its terseness, leaves to be desired in terms of
flexibility. While a tuple might be the logical choice to pack
positional arguments in the vast majority of cases, it need not be
true that a list is always the preferred choice to repack an unpacked
sequence, for instance.


Type constraints:

In case the asterisk is not used to signal unpacking, but rather to
signal packing, its semantics is essentially that of a type
constraint. The statement:

    head, tail = sequence

Signifies regular unpacking. However, if we add an asterisk, as in:

    head, *tail = sequence

We demand that tail not be just any python object, but rather a list.
This changes the semantics from normal unpacking, to unpacking and
then repacking all but the head into a list.

It may be somewhat counter-intuitive to think of this as a type
constraint, since python is after all a weakly-typed language. But the
current usage of askeriskes is an exception to that rule. For those
who are unconvinced, please consider the analogy to the following
simple C# code:

    var foo = 3;

An ‘untyped‘ object foo is created (actually, its type will be
inferred from its rhs as an integer).

    float foo = 3;

By giving foo a type-constraint of float instead, the semantics are
modified; foo is no longer the integer 3, but gets silently cast to
3.0. This is a simple example, but conceptually entirely analogous to
what happens when one places an asterisk before an lvalue in Python.
It means ‘be a list, and adjust your behavior accordingly’, versus ‘be
a float, and adjust your behavior accordingly’.

The aim of this PEP, is that this type-constraint syntax is expanded
upon. We should be careful here to distinguish with providing optional
type constraints throughout python as a whole; this is not our aim.
This concept has been considered before, but the costs have not been
found to out-weight the benefits. http://www.artima.com/weblogs/viewpost.jsp?thread=86641
Our primary aim is the niche of collection packing/unpacking, but if
further generalizations can be made without increasing the cost, those
are most welcome. To reiterate: what is proposed is nothing radical;
merely to replace the asterisk-based type constraints with a more
explicit type constraint.

Currently favored alternative syntax:

Both for the sake of explicitness and flexibility, we consider it
desirable that the name of the collection type is used directly in any
collection packing statement. Annotating a variable declaration with a
collection type name should signal collection packing. This
association between a collection type name and a variable declaration
can be accomplished in many ways; for now, we suggest
collectionname::collectiontype for packing, and ::collectionname for
unpacking.

Examples of use:
    head, tail::tuple = ::sequence
    def foo(args::list, kwargs::dict): pass
    foo(::args, ::kwargs)

The central idea is to replace annotations with asteriskes by
annotations with collection type names, but note that we have opted
for several other minor alterations of the existing syntax that seem
natural given the proposed changes.

First of all, explicitly mentioning the type of the collection
involved eliminates the need to have two symbols, * and **. Which
variable captures the positional arguments and which captures the
keyword arguments can be inferred from the collection type they model,
mapping or sequence. The rare case of collections that both model a
sequence and a mapping can either be excluded or handled by assigning
precedence for one type or the other.

A double semicolon before a collection type signals unpacking. As with
declarations, there is no genuine need to have a different operator
for sequence and mapping types, although if such a demand exists, it
would not be hard to accommodate. A double semicolon in front of the
collection is congruent with the asterisk syntax, and nicely
emphasizes this unpacking operation being the symmetric counterpart of
the packing operation, which is signalled by the same symbols to the
right of the identifier. Since we are going to make the double
semicolon (or whatever the symbol) a general collection packing/
unpacking marker, we feel it makes sense to allow it to be used to
explicitly signify unpacking, even when as much is implied by the
syntax on the left hand side, to preserve symmetry with the syntax
inside function calls.

Summarizing, what this syntax achieves, in loose order of perceived
importance:
Simplicity: we have reduced a set of rather arbitrary rules concerning
the syntax and semantics of the asterisk (does it construct a list or
a tuple?) to a single general symbol: the double semicolon is the
collection packing/unpacking annotation symbol, and that is all there
is to know about it.
Readability: the proposed syntax reads like a book: args-list and
kwargs-dict, unlike the more cryptic asterisk syntax. We avoid extra
lines of code in the event another sequence or mapping type than the
one returned by default is required.
Efficiency: by declaring the desired collection type, it can be
constructed in the optimal way from the given input, rather than
requiring a conversion after the default collection type is
constructed.

A double semicolon is suggested, since the single colon is already
taken by the function annotation syntax in Python 3. This is somewhat
unfortunate: programming should come before meta-programming, and it
should rather be the other way around. On the one hand having both :
and :: as variable declaration annotation symbols is a nice
unification, on the other hand, a syntax more easily visually
distinguished from function annotations can be defended. For increased
backwards compatibility the asterisk could be used, but sandwiched
between two identifiers it looks like a multiplication. But many
others symbols would do, such as @ or !.



More information about the Python-list mailing list