Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 646 -- Variadic Generics

PEP:0646
Title:Variadic Generics
Author:Mark Mendoza <mendoza.mark.a at gmail.com>, Matthew Rahtz <mrahtz at google.com>, Pradeep Kumar Srinivasan <gohanpra at gmail.com>, Vincent Siles <vsiles at fb.com>
Sponsor:Guido van Rossum <guido at python.org>
Status:Draft
Type:Standards Track
Created:16-Sep-2020
Python-Version:3.10
Post-History:07-Oct-2020, 23-Dec-2020, 29-Dec-2020

Abstract

PEP 484 introduced TypeVar, enabling creation of generics parameterised with a single type. In this PEP, we introduce TypeVarTuple, enabling parameterisation with an arbitrary number of types - that is, a variadic type variable, enabling variadic generics. This allows the type of array-like structures in numerical computing libraries such as NumPy and TensorFlow to be parameterised with the array shape, enabling static type checkers to catch shape-related bugs in code that uses these libraries.

Motivation

There are two main use-cases for variadic generics. [1]

The primary motivation is to enable typing of array shapes in numerical computing libraries such as NumPy and TensorFlow. This is the use-case much of the PEP will focus on.

Additionally, variadic generics allow us to concisely specify the type signature of map and zip.

We discuss each of these motivations below.

Array Shapes

In the context of numerical computation with libraries such as NumPy and TensorFlow, the shape of arguments is often just as important as the argument type. For example, consider the following function which converts a batch [2] of videos to grayscale:

def to_gray(videos: Array): ...

From the signature alone, it is not obvious what shape of array [3] we should pass for the videos argument. Possibilities include, for example,

batch × time × height × width × channels

and

time × batch × channels × height × width. [4]

Ideally, we should have some way of making the required shape clear in the signature itself. Multiple proposals [7] [8] [9] have suggested the use of the standard generics syntax for this purpose. We would write:

def to_gray(videos: Array[Time, Batch, Height, Width, Channels]): ...

However, note that arrays can be of arbitrary rank - Array as used above is generic in an arbitrary number of axes. One way around this would be to use a different Array class for each rank...

Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')

class Array1(Generic[Axis1]): ...

class Array2(Generic[Axis1, Axis2]): ...

...but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s and so on throughout their code) and for the authors of array libraries (who would have to duplicate implementations throughout multiple classes).

Variadic generics are necessary for a Array that is generic in an arbitrary number of axes to be cleanly defined as a single class.

map and zip

PEP 612 [6] introduced ParamSpec, enabling parameter types of one callable to be forwarded to another callable. However, in many cases we actually wish to transform parameter types before using them elsewhere in the signature.

Consider, for example, the signature of map for a particular choice of function and iterables:

def func(int, str) -> float: ...
iter1: List[int]
iter2: List[str]

def map(func: Callable[[int, str], float],
        iter1: Iterable[int],
        iter2: Iterable[str]) -> Iterable[float]: ...

Note that the number of iterables passed to map is dependent on the supplied function, and that the types of those iterables must correspond to the types of supplied function's arguments.

A similar example is zip:

iter1: List[int]
iter2: List[str]

def zip(iter1: Iterable[int],
        iter2: Iterable[str]) -> Iterable[Tuple[int, str]]: ...

Neither of these signatures can be specified in the general case using existing typing mechanisms. The signature of zip, for example, is currently specified [10] with a number of overloads.

Specification

In order to support the above use-cases, we introduce:

  • TypeVarTuple, serving as a placeholder not for a single type but for an arbitrary number of types, and behaving like a number of TypeVar instances packed in a Tuple.
  • A new use for the star operator: unpacking of each individual type from a TypeVarTuple.
  • Two new type operators, Unpack and Map.

These are described in detail below.

Type Variable Tuples

In the same way that a normal type variable is a stand-in for a single type, a type variable tuple is a stand-in for an arbitrary number of types (zero or more) in a flat ordered list.

Type variable tuples are created with:

from typing import TypeVarTuple

Ts = TypeVarTuple('Ts')

A type variable tuple behaves in a similar way to a parameterized Tuple. For example, in a generic object instantiated with type parameters int and str, Ts is equivalent to Tuple[int, str].

Type variable tuples can be used anywhere a normal TupleVar can. For example, in class definitions, function signatures, and variable annotations:

Shape = TypeTupleVar('Shape')

class Array(Generic[Shape]):

    def __init__(self, shape: Shape):
      self.shape: Shape = shape

    def __abs__(self) -> Array[Shape]: ...

    def __add__(self, other: Array[Shape]) -> Array[Shape]: ...

Height = NewType('Height', int)
Width = NewType('Width', int)
shape = (Height(480), Width(640))
x: Array[Tuple[Height, Width]] = Array(shape)
x.shape     # Inferred type is Tuple[Height, Width]
y = abs(x)  # Array[Tuple[Height, Width]]
z = x + y   # Array[Tuple[Height, Width]]

Variance and bound: Not (Yet) Supported

To keep this PEP minimal, TypeTupleVar does not yet support the bound argument or specification of variance, as TypeVar does. We leave the decision of how these arguments should be implemented to a future PEP, when use-cases for variadic generics have been explored more in practice.

Unpacking: Star Operator

Note that the fully-parameterised type of Array above is rather verbose. Wouldn't it be easier if we could just write Array[Height, Width]?

To enable this, we introduce a new function for the star operator: to 'unpack' type variable tuples. When unpacked, a type variable tuple behaves as if its component types had been written directly into the signature, rather than being wrapped in a Tuple.

Rewriting the Array class using an unpacked type variable tuple, we can instead write:

Shape = TypeTupleVar('Shape')

class Array(Generic[*Shape]):

    def __init__(self, shape: Shape):
      self.shape: Shape = shape

    def __add__(self, other: Array[*Shape]) -> Array[*Shape]: ...

shape = (Height(480), Width(640))
x: Array[Height, Width] = Array(shape)
x.shape     # Inferred type is Tuple[Height, Width]
z = x + x   # Array[Height, Width]

Unpacking: Unpack Operator

Because the new use of the star operator requires a syntax change and is therefore incompatible with previous versions of Python, we also introduce the typing.Unpack type operator for use in existing versions of Python. Unpack takes a single type variable tuple argument, and behaves identically to the star operator, but without requiring a syntax change. In any place you would normally write *Ts, you can also write Unpack[Ts].

*args as a Type Variable Tuple

PEP 484 states that when a type annotation is provided for *args, each argument must be of the type annotated. That is, if we specify *args to be type int, then all arguments must be of type int. This limits our ability to specify the type signatures of functions that take heterogeneous argument types.

If *args is annotated as an unpacked type variable tuple, however, the types of the individual arguments become the types in the type variable tuple:

def args_to_tuple(*args: *Ts) -> Ts: ...

args_to_tuple(1, 'a')  # Inferred type is Tuple[int, str]

Note that the type variable tuple must be unpacked in order for this new behaviour to apply. If the type variable tuple is not unpacked, the old behaviour still applies:

# *All* arguments must be of type Tuple[T1, T2],
# where T1 and T2 are the same types for all arguments
def foo(*args: Ts) -> Ts: ...

x: Tuple[int, str]
y: Tuple[int, str]
foo(x, y)  # Valid

z: Tuple[bool]
foo(x, z)  # Not valid

Finally, note that a type variable tuple may not be used as the type of **kwargs. (We do not yet know of a use-case for this feature, so prefer to leave the ground fresh for a potential future PEP.)

# NOT valid
def foo(**kwargs: Ts): ...
def foo(**kwargs: *Ts): ...

Type Variable Tuples with Callable

Type variable tuples can also be used in the arguments section of a Callable:

class Process:
  def __init__(target: Callable[[*Ts], Any], args: Tuple[*Ts]): ...

def func(arg1: int, arg2: str): ...

Process(target=func, args=(0, 'foo'))  # Passes type-check
Process(target=func, args=('foo', 0))  # Fails type-check

Type Variable Tuples with Union

Finally, type variable tuples can be used with Union:

def f(*args: *Ts) -> Union[*Ts]:
    return random.choice(args)

f(1, 'foo')  # Inferred type is Union[int, str]

If the type variable tuple is empty (e.g. if we had *args: *Ts and didn't pass any arguments), the type checker should raise an error on the Union (matching the behaviour of Union at runtime, which requires at least one type argument).

Map

To enable typing of functions such as map and zip, we introduce the Map type operator. Not to be confused with the existing operator typing.Mapping, Map is analogous to map, but for types:

from typing import Map

def args_to_lists(*args: *Ts) -> Map[List, Ts]: ...

args_to_lists(1, 'a')  # Inferred type is Tuple[List[int], List[str]]

Map takes two operands. The first operand is a parameterizable type (or type alias [5]) such as Tuple, List, or a user-defined generic class. The second operand is a type variable tuple. The result of Map is a Tuple, where the Nth type in the Tuple is the first operand parameterized by the Nth type in the type variable tuple.

Because Map returns a parameterized Tuple, it can be used anywhere that a type variable tuple would be. For example, as the type of *args:

# Equivalent to 'arg1: List[T1], arg2: List[T2], ...'
def foo(*args: *Map[List, Ts]): ...
# Ts is bound to Tuple[int, str]
foo([1], ['a'])

As a return type:

# Equivalent to '-> Tuple[List[T1], List[T2], ...]'
def bar(*args: *Ts) -> Map[List, Ts]: ...
# Ts is bound to Tuple[float, bool]
# Inferred type is Tuple[List[float], List[bool]]
bar(1.0, True)

And as an argument type:

# Equivalent to 'arg: Tuple[List[T1], List[T2], ...]'
def baz(arg: Map[List, Ts]): ...
# Ts is bound to Tuple[bool, bool]
baz(([True], [False]))

map and zip

Map allows us to specify the signature of map as:

Ts = TypeVarTuple('Ts')
R = TypeVar(R)

def map(func: Callable[[*Ts], R],
        *iterables: *Map[Iterable, Ts]) -> Iterable[R]: ...

def func(int, str) -> float: ...
# Ts is bound to Tuple[int, str]
# Map[Iterable, Ts] is Iterable[int], Iterable[str]
# Therefore, iter1 must be type Iterable[int],
#        and iter2 must be type Iterable[str]
map(func, iter1, iter2)

Similarly, we can specify the signature of zip as:

def zip(*iterables: *Map[Iterable, Ts]) -> Iterator[Ts]): ...

l1: List[int]
l2: List[str]
zip(l1, l2)  # Iterator[Tuple[int, str]]

Overloads for Accessing Individual Types

Map allows us to operate on types in a bulk fashion. For situations where we require access to each individual type, overloads can be used with individual TypeVar instances in place of the type variable tuple:

Shape = TypeVarTuple('Shape')
Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')
Axis3 = TypeVar('Axis3')

class Array(Generic[*Shape]): ...

  @overload
  def transpose(
    self: Array[Axis1, Axis2]
  ) -> Array[Axis2, Axis1]: ...

  @overload
  def transpose(
    self: Array[Axis1, Axis2, Axis3)
  ) -> Array[Axis3, Axis2, Axis1]: ...

(For array shape operations in particular, having to specify overloads for each possible rank is, of course, a rather cumbersome solution. However, it's the best we can do without additional type manipulation mechanisms, which are beyond the scope of this PEP.)

Concatenating Other Types to a Type Variable Tuple

If an unpacked type variable tuple appears with other types in the same type parameter list, the effect is to concatenate those types with the types in the type variable tuple. For example, concatenation in a function return type:

Batch = NewType('int')
Height = NewType('int')
Width = NewType('int')

class Array(Generic[*Shape]): ...

def add_batch(x: Array[*Shape]) -> Array[Batch, *Shape]: ...

x: Array[Height, Width]
y = add_batch(x)  # Inferred type is Array[Batch, Height, Width]

In function argument types:

def batch_sum(x: Array[Batch, *Shape]) -> Array[*Shape]: ...

x: Array[Batch, Height, Width]
y = batch_sum(x)  # Inferred type is Array[Height, Width]

And in class type parameters:

class BatchArray(Generic[Batch, *Shape]):
  def sum(self) -> Array[*Shape]: ...

x: BatchArray[Batch, Height, Width]
y = x.sum()  # Inferred type is Array[Height, Width]

Concatenation can involve both prefixing and suffixing, and can include an arbitrary number of types:

def foo(x: Tuple[*Ts]) -> Tuple[int, str, *Ts, bool]: ...

It is also possible to concatenate type variable tuples with regular type variables:

T = TypeVar('T')

def first_axis_sum(x: Array[T, *Shape]) -> Array[*Shape]: ...

x: Array[Time, Height, Width]
y = first_axis_sum(x)  # Inferred type is Array[Height, Width]

Finally, concatenation can also occur in the argument list to Callable:

def f(func: Callable[[int, *Ts], Any]) -> Tuple[*Ts]: ...

def foo(int, str, float): ...
def bar(str, int, float): ...

f(foo)  # Valid; inferred type is Tuple[str, float]
f(bar)  # Not valid

And in Union:

def f(*args: *Ts) -> Union[*Ts, float]: ...

f(0, 'spam')  # Inferred type is Union[int, str, float]

Concatenating Multiple Type Variable Tuples

We can also concatenate multiple type variable tuples, but only in cases where the types bound to each type variable tuple can be inferred unambiguously. Note that this is not always the case:

# Type checker should raise an error on definition of func;
# how would we know which types are bound to Ts1, and which
# are bound to Ts2?
def func(ham: Tuple[*Ts1, *Ts2]): ...

# Ts1 = Tuple[int, str], Ts2 = Tuple[bool]?
# Or Ts1 = Tuple[int], Ts2 = Tuple[str, bool]?
ham: Tuple[int, str, bool]
func(ham)

In general, some kind of extra constraint is necessary in order for the ambiguity to be resolved. This is usually provided by an un-concatenated usage of the type variable tuple elsewhere in the same signature.

For example, resolving ambiguity in an argument:

def func(ham: Tuple[*Ts1, *Ts2], spam: Ts2): ...

# Ts1 is bound to Tuple[int], Ts2 to Tuple[str, bool]
ham: Tuple[int, str, bool]
spam: Tuple[str, bool]
func(ham, spam)

In a return type:

def func(ham: Ts1, spam: Ts2) -> Tuple[*Ts1, *Ts2]): ...

ham: Tuple[int]
spam: Tuple[str, bool]
# Return type is Tuple[int, str, bool]
func(ham, spam)

Note, however, that the same cannot be done with generic classes:

# No way to add extra constraints about Ts1 and Ts2,
# so this is not valid
class C(Generic[*Ts1, *Ts2]): ...

Generics in Multiple Type Variable Tuples

If we do wish to use multiple type variable tuples in a type signature that would otherwise not resolve the ambiguity, it is also possible to make the type bindings explicit by using a type variable tuple directly, without unpacking it. When then instantiating, for example, the class in question, the types corresponding to each type variable tuple must be wrapped in a Tuple:

class C(Generic[Ts1, Ts2]): ...

# Ts1 = Tuple[int, str]
# Ts2 = Tuple[bool]
c: C[Tuple[int, str], Tuple[bool]] = C()

Similarly for functions:

def foo(x: Tuple[Ts1, Ts2]): ...

# Ts1 = Tuple[int, float]
# Ts2 = Tuple[bool]
x: Tuple[Tuple[int, float], Tuple[bool]]
foo(x)

Aliases

Generic aliases can be created using a type variable tuple in a similar way to regular type variables:

IntTuple = Tuple[int, *Ts]
IntTuple[float, bool]  # Equivalent to Tuple[int, float, bool]

As this example shows, all type arguments passed to the alias are bound to the type variable tuple. If no type arguments are given, the type variable tuple holds no types:

IntTuple  # Equivalent to Tuple[int]

Type variable tuples can also be used without unpacking:

IntTuple = Tuple[int, Ts]
IntTuple[float, bool]  # Equivalent to Tuple[int, Tuple[float, bool]]
IntTuple  # Tuple[int, Tuple[]]

At most a single distinct type variable tuple can occur in an alias:

# Invalid
Foo = Tuple[Ts1, int, Ts2]
# Why? Because there would be no way to decide which types should
# be bound to which type variable tuple:
Foo[float, bool, str]
# Equivalent to Tuple[float, bool, int, str]?
# Or Tuple[float, int, bool, str]?

The same type variable tuple may be used multiple times, however:

Bar = Tuple[*Ts, *Ts]
Bar[int, float]  # Equivalent to Tuple[int, float, int, float]

Finally, type variable tuples can be used in combination with normal type variables. In this case, the number of type arguments must be equal to or greater than the number of distinct normal type variables:

Baz = Tuple[T1, *Ts, T2, T1]

# T1 bound to int, T2 bound to bool, Ts empty
# Equivalent to Tuple[int, bool, int]
Baz[int, bool]

# T1 bound to int
# Ts bound to Tuple[float, bool]
# T2 bound to str
# So equivalent to Tuple[int, float, bool, str, int]
Baz[int, float, bool, str]

An Ideal Array Type: One Possible Example

Type variable tuples allow us to make significant progress on the typing of arrays. However, the array class we have sketched out in this PEP is still missing some desirable features. [8]

The most crucial feature missing is the ability to specify the data type (e.g. np.float32 or np.uint8). This is important because some numerical computing libraries will silently cast types, which can easily lead to hard-to-diagnose bugs.

Additionally, it might be useful to be able to specify the rank instead of the full shape. This could be useful for cases where axes don't have obvious semantic meaning like 'height' or 'width', or where the array is very high-dimensional and writing out all the axes would be too verbose.

Here is one possible example of how these features might be implemented in a complete array type.

# E.g. Ndim[Literal[3]]
Integer = TypeVar('Integer')
class Ndim(Generic[Integer]): ...

# E.g. Shape[Height, Width]
# (Where Height and Width are custom types)
Axes = TypeVarTuple('Axes')
class Shape(Generic[*Axes]): ...

DataType = TypeVar('DataType')
ShapeType = TypeVar('ShapeType', NDim, Shape)

# The most verbose type
# E.g. Array[np.float32, Ndim[Literal[3]]
#      Array[np.uint8, Shape[Height, Width, Channels]]
class Array(Generic[DataType, ShapeType]): ...

# Type aliases for less verbosity
# E.g. Float32Array[Height, Width, Channels]
Float32Array = Array[np.float32, Shape[*Axes]]
# E.g. Array1D[np.uint8]
Array1D = Array[DataType, Ndim[Literal[1]]]

Rationale and Rejected Ideas

Supporting Variadicity Through aliases

As noted in the introduction, it is possible to avoid variadic generics by simply defining aliases for each possible number of type parameters:

class Array1(Generic[Axis1]): ...
class Array2(Generic[Axis1, Axis2]): ...

However, this seems somewhat clumsy - it requires users to unnecessarily pepper their code with 1s, 2s, and so on for each rank necessary.

Naming of Map

One downside to the name Map is that it might suggest a hash map. We considered a number of different options for the name of this operator.

  • ForEach. This is rather long, and we thought might imply a side-effect.
  • Transform. The meaning of this isn't obvious enough at first glance.
  • Apply. This is inconsistent with apply, an older Python function which enabled conversion of iterables to arguments before the star operator was introduced.

In the end, we decided that Map was good enough.

Nesting Map

Since the result of Map is a parameterised Tuple, it should be possible to use the output of a Map as the input to another Map:

Map[Tuple, Map[List, Ts]]

If Ts here were bound to Tuple[int, str], the result of the inner Map would be Tuple[List[int], List[str]], so the result of the outer map would be Tuple[Tuple[List[int]], Tuple[List[str]]].

We chose not to highlight this fact because of a) how confusing it is, and b) lack of a specific use-case. Whether to support nested Map is left to the implementation.

Naming of TypeVarTuple

TypeVarTuple began as ListVariadic, based on its naming in an early implementation in Pyre.

We then changed this to TypeVar(list=True), on the basis that a) it better emphasises the similarity to TypeVar, and b) the meaning of 'list' is more easily understood than the jargon of 'variadic'.

We finally settled on TypeVarTuple based on the justification that c) this emphasises the tuple-like behaviour, and d) type variable tuples are a sufficiently different kind of thing to regular type variables that we may later wish to support keyword arguments to its constructor that should not be supported by regular type variables (such as arbitrary_len [12]).

Backwards Compatibility

TODO

  • Tuple needs to be upgraded to support parameterization with a a type variable tuple.

Footnotes

[1]A third potential use is in enabling higher-kinded types that take an arbitrary number of type operands, but we do not discuss this use here.
[2]'Batch' is machine learning parlance for 'a number of'.
[3]We use the term 'array' to refer to a matrix with an arbitrary number of dimensions. In NumPy, the corresponding class is the ndarray; in TensorFlow, the Tensor; and so on.
[4]If the shape begins with 'batch × time', then videos_batch[0][1] would select the second frame of the first video. If the shape begins with 'time × batch', then videos_batch[1][0] would select the same frame.
[5]For example, in asyncio [11], it is convenient to define a type alias _FutureT = Union[Future[_T], Generator[Any, None, _T], Awaitable[_T]]. We should be able to apply Map to such aliases - e.g. Map[_FutureT, Ts].

Acknowledgements

Thank you to Alfonso Castaño, Antoine Pitrou, Bas v.B., David Foster, Dimitris Vardoulakis, Eric Traut, Guido van Rossum, Jia Chen, Lucio Fernandez-Arjona, Nikita Sobolev, Peilonrayz, Rebecca Chen, Sergei Lebedev and Vladimir Mikulik for helpful feedback and suggestions on drafts of this PEP.

Thank you especially to Lucio, for suggesting the star syntax, which has made multiple aspects of this proposal much more concise and intuitive.

Resources

Discussions on variadic generics in Python started in 2016 with Issue 193 [13] on the python/typing GitHub repository.

Inspired by this discussion, Ivan Levkivskyi made a concrete proposal at PyCon 2019, summarised in Type system improvements [14] and Static typing of Python numeric stack [15].

Expanding on these ideas, Mark Mendoza and Vincent Siles gave a presentation on Variadic Type Variables for Decorators and Tensors [16] at the 2019 Python Typing Summit.

Source: https://github.com/python/peps/blob/master/pep-0646.rst