Making the case for "typed" lists/iterators in python

Fri Dec 16 12:48:41 EST 2011

I realize this has been discussed in the past, I hope that I am
presenting a slightly different take on the subject that will prove
interesting.  This is primarily motivated by my annoyance with using
comprehensions in certain circumstances.

Currently, if you want to perform successive transformations on the
elements of a list, a couple of options:

1. Successive comprehensions:

L2 = [X(e) for e in L1]
L3 = [Y(e) for e in L2]
L4 = [Z(e) for e in L3]
or
L2 = [e.X() for e in L1]

This gets the job done and gives you access to all the intermediate
values, but isn't very succinct, particularly if you are in the habit
of using informative identifiers.

2. One comprehension:

L2 = [Z(X(Y(e))) for e in L1]
or
L2 = [e.X().Y().Z() for e in L1]

This gets the job done, but doesn't give you access to all the
intermediate values, and tends to be pretty awful to read.

Having "typed" lists let you take preexisting string/int/etc methods
and expose them in a vectorized context and provides an easy way for
developers to support both vectors and scalars in a single function
(you could easily "fix" other people's functions dynamically to
support both).  Additionally, "typed" lists/iterators will allow
improved code analysis and optimization.  The PyPy people have already
stated that they are working on implementing different strategies for
lists composed of a single type, so clearly there is already community
movement in this direction.

Just compare the above examples to their type-aware counterparts:

L2 = X(L1)
L2 = L1.X()

L2 = Z(Y(X(L1)))
L2 = L1.X().Y().Z()

Also, this would provide a way to clean up stuff like:

"\n".join(l.capitalize() for l in my_string.split("\n"))

into:

my_string.split("\n").capitalize().join_this("\n")

Before anyone gets up in arms at the idea of statically typed python,
what I am suggesting here would be looser than that.  Basically, I
believe it would be a good idea in instances where it is known that a
list of single type is going to be returned, to return a list subclass
(for example, StringList, IntegerList, etc).  To avoid handcuffing
people with types, the standard list modification methods could be
hooked so that if an object of an incorrect type is placed in the
list, a warning is raised and the list converts to a generic object
list.  The only stumbling block is that you can't use __class__ to
convert from stack types to heap types in CPython.  My workaround for
this would be to have a factory that creates generic "List" classes,
modifying the bases to produce the correct behavior.  Then, converting
from a typed list to a generic object list would just be a matter of
removing a member from the bases for a class.  This of course
basically kills the ability to perform type specific list optimization
in CPython, but that isn't necessarily true for other implementations.
 The additional type information would be preserved for code analysis
in any case.  The case would be even simpler for generators and other
iterators, as you don't have to worry about mutation.

I'd like to hear people's thoughts on the subject.  Currently we are
throwing away useful information in many cases that could be used for
code analysis, optimization and simpler interfaces.  I believe that
"typed" lists that get "demoted" to normal lists with a warning on out
of type operations preserve this information while providing complete
backwards compatibility and freedom.

Nathan