[Types-sig] PyDL RFC 0.4

Paul Prescod paul@prescod.net
Thu, 30 Dec 1999 12:44:39 -0500


PyDL RFC 0.03

A PyDL file declares the interface for a Python module. PyDL files
declare interfaces, objects and the required interfaces of objects.

This document (loosely, informally) describes the behavior of a class
of software modules called "static interface interpreters" and "static
interface checkers". Interface interpreters are run as part of the
regular Python module interpetation process. They read PyDL files and
make the interface objects available to the Python compiler.
Interface checkers read PyDL files and Python code to verify
conformance of the code to the interface.

Once this design is done we will write a formal specification.

PyDL Files:
===========
A PyDL file can be either created by a programmer or auto-generated.
The syntax and semantics of the two types are identical.  An
auto-generated file is created by scanning a Python module for inline
declarations. 

Interfaces are the central concept in PyDL files. Interfaces are
Python objects like anything else but they are created by the
interface interpreter. They are made available to the static interface
checker before Python compilation begins. 

In addition to defining interfaces, it is possible to declare other
attributes of the module. Each declaration associates an interface
with the name of the attribute. Values associated with the name in the
module namespace must always conform to the declared interface.
Furthermore, by the time the module has been imported each name must
have an associated value. It is not necessary for the static interface
checker to prove that these rules will not be violated. It is also
acceptable to check at runtime.

Grammar:
========
In the very short term, implementors are encouraged to use any grammar
that allows every example in this document. Contributions of proposals
for the grammar are solicited.

Interfaces:
===========
Interfaces are created through interface definitions and interface
expressions. There may also be facilities for creating interfaces at
runtime but they are neither available to nor relevant to the
interface interpreter.

Interface definitions are similar to Python class definitions. They
use the keyword "interface" instead of the keyword "class". 

Interfaces are either complete or incomplete. An incomplete interface
takes parameters and a complete interface does not. It is not possible
to create Python objects that conform to incomplete interfaces. They
are just a reuse mechanism analogous to functions in Python.  An
example of an incomplete interface would be "Sequence". It is
incomplete because we need to define the interface of the contents of
the sequence.  

In an interface expression the programmer can provide parameters to
generate a new interface. 

Typedefs allow us to give names to complete or incomplete interfaces
described by interface expressions. Typedefs are an interface
expression re-use mechanism.

Interfaces have an intuitive concept of equivalence which will be
formalized later in the document.

Behavior:
=========
For our purposes, we will presume that every Python environment has
some form of compilation phase. This is true of all existing Python
environments.

The Python compiler invokes the static interface interpreter and
optionally the interface checker on a Python file and its associated
PyDL file.  Typically a PyDL file is associated with a Python file
through placement in the same path with the same base name and a
".pydl"  or ".gpydl" extension. If both are avaiable, the module's
interface is created by combining the declarations in the ".pydl" and
".gpydl" files.

"Non-standard" importer modules may find PyDL files using other
mechanisms such as through a look-up in an relational database, just
as they find modules themselves using non-standard mechanisms.

The interface interpreter reads the PyDL file and builds the
relevant interface objects. If the PyDL file refers to other modules
then the interface interpreter can read the PyDL files associated
with those other modules after generating them if necessary.

It is acceptable to use date-stamps, CRCs and other heuristics to
demonstrate that a generated PyDL file is not likely to be
inconsistent with its module.

The Python compiler may invoke the interface checker after the
interface interpreter has built interface objects and before it
interprets the Python module.

Once it interprets the Python code, the interface objects are
available to the runtime code through a special namespace called the
"interface namespace". There is one such namespace per module. It is
accessible from the module's namespace via the name "__interfaces__". 

This namespace is interposed in the name search order between the
module's namespace and the built-in namespace.

Built-in Interfaces:
====================

Any
    Number
        Integral
            Int
            Long
        Float
        Complex
    Sequence
        String
        Record
    Mapping
    Modules
    Callable
        Class
        Function
        Methods
            UnboundMethods
            BoundMethods    
    Null
    File

Certain interfaces may have only one implementation. These "primitive"
types 
are Int, Long, Float, String, UnboundMethods, BoundMethods, Module,
Function 
and Null. Over time this list may get shorter as the Python
implementation is generalized to work mostly by interfaces.

Note: In rare cases it may be necessary to create new primitive
types with only a single implementation (such as "window handle" or
"file handle"). This is the case when the object's actual bit-pattern
is more important than its interface.

Note: The Python interface graph may not always be a tree. For
instance there might someday be a type that is both a mapping and a
sequence.

The details of each interface remain to be worked out. Volunteers are
solicited.

Interface expression language:
==============================

Interface expressions are used to declare that attributes must conform
to certain interfaces.  In a interface expression you may:

1. refer to an interface by name. The name can either be simple or
it may be of the form "module.interfacename" where "interfacename"
is a name in one of two PyDL files for the named module.

The expression evaluates to the referenced interface.

Two expressions consisting only of names are equivalent if the
referenced interface objects are equivalent.

2. make a union of two or more interfaces:

integer or float 
integer or float or complex

The expression evaluates to an interface object I such that a value V
conforms to I iff it conforms to any interface in ths list.

Two union expressions X and Y are equivalent if their lists are the
same length and each element in X has an equivalent in Y and vice
versa.

3. parameterize a interface:

Array( Int, 50 )
Array( length=50, elements=Int )

Note that the arguments can be either interface expressions or simple
Python expressions. A "simple" Python expression is an expression that
does not involve a function call or variable reference.

The expression evaluates to a complete instantiation of the referenced
incomplete interface.

Two parameterization expressions are equivalent if the parameterized
interface is equivalent and each parameter is equivalent.

4. use a syntactic shortcut:

[Foo] => Sequence( Foo ) # sequence of Foo's
{String:Int} => Mapping( String, Int ) # Mapping from A's to B's
(A,B,C) => Record( A, B, C ) # 3-element sequence of interface a,
followed
                             # by b followed by c

The expression evaluates to the same thing as the expanded versions.
Equivalence is identical to the situation for the expanded versions.

5. generate a callable interface:

def( Arg1 as Type1, Arg2 as Type2 ) -> ReturnType

The argument name may be elided:

def( Int, String ) -> None

Note: this is provided for compatibiity with libraries and tools that
may not support named arguments. Python programmers are strongly
encouraged to use argument names as they are good documentation and
are useful for development environments and other reflective tools.

It is possible to declare variable length argument lists. They must
always be declared as sequences but the element interface may vary.

def( Arg1 as String, * as [Int] ) -> Int 
            # callable taking a String, and some un-named Int
            # arguments

Finally, it is possible to declare keyword argument lists. They must
always be declared as mappings from string to some interface.

def( Arg1 as Int , ** as {String: Int}) - > Int

Note that at this point in time, every Python callable returns
something, even if it is None. The return value can be named,
merely as documentation:

def( Arg1 as Int , ** as {String: Int}) - > ReturnCode as Int

The expression evaluates to a callable interface that takes the
described arguments and returns the described value.

Declarations in a PyDL file:
============================

 1. Imports

An import statement in an interface file loads another interface file.
The import statement works just like Python's except that it loads the
PyDL file found with the referenced module, not the module itself. (of
course we will make this definition more formal in the future)

 2. Basic attribute interface declarations:

decl myint as Int                   # basic 
decl intarr as Array( Int, 50 )     # parameterized
decl intarr2 as Array( size = 40, elements = Int ) # using keyword
syntax

Attribute declarations are not parameteriable. Furthermore, they must
resolve to complete interfaces.

So this is allowed:

class (_X,_Y) spam( A, B ):
    decl someInstanceMember as _X
    decl someOtherMember as Array( _X, 50 )

    ....

These are NOT allowed:

decl someModuleMember(_X) as Array( _X, 50 )

class (_Y) spam( A, B ):
    decl someInstanceMember(_X) as Array( _X, 50 ) 

Because that would allow you to create a "spam" without getting around
to saying what _X is for that spam's someInstanceMember. That would
disallow static type checking.

 3. Callable object interface declarations:

Functions are the most common sort of callable object but class
instances can also be callable. Callables may be runtime parameterized
and/or interface parameterized.  For instance, there might be a method
"add" that takes two objects with the same interface and returns an
object with that interface.

decl DoSomething( _X ) as def( a as _X, b as _X )-> _X

_X is the interface parameter. By convention these start with
underscores. a and b are the runtime parameters. 

Note: it is usually possible to coerce a parameterized function into a
fully polymorphic function where the arguments can vary from each
other quite widely despite being declared to have the same parameter
type. You can do this by instantiating the function with "Any" as the
parametric type.

It is possible to allow _X to vary to some extent but still require it
to always be a Number:

decl Add(_X as Number) as def( a as _X, b as _X )-> _X

So this function could take two longs or two floats but not two
strings.

Note: as above, you could create a version that would take a float and
a long by referring to a common base interface like Number itself.

 4. Class Declarations

A class is a callable object that can be subclassed.  Currently the
only way to make those (short of magic) is with a class declaration,
but one could imagine that there might someday be an __subclass__
magic method that would allow any old object instance to also stand in
as a class.

The syntax for a class definition is identical to that for a function
with the keyword "def" replaced by "class".  What we are really
defining is the constructor. The signature of the created object can
be described in an interface declaration.

decl TreeNode(_X) as class( 
            a as _X, 
            Right as TreeNode( _X ) or None,
            Left as TreeNode( _X ) or None )
                -> ParentClasses, Interfaces

When the initialization completes, every attribute in the declared
interfaces should have a value.

 5. Interface declarations:

An interface decaration starts with the keyword "interface",
optionally has interface parameters in parentheses and then continues
with the interface name and the names of super-interfaces. This
interface inherits and must not contradict the signature of the parent
interfaces.

The interface body is made up of attribute declarations.

interface (_X,_Y) spam( a, b ):
    decl somemember as _X
    decl someOtherMember as _Y
    decl someClassAttr as [ _X ]

    decl someFunction as def( a as Int, b as Float ) -> String

 6. Typedefs:

Typedefs allow interfaces to be renamed and for parameterized
variations of interfaces to be given names.

typedef PositiveInt as BoundedInt( 0, maxint )
typedef NegativeInt as BoundedInt( max=-1, min=minint )
typedef NullableInt as Int or None
typedef Dictionary(_Y) as {String:_Y}

New Module Syntax:
======================
In a future version of Python, declarations will be allowed in Python
code and will have the same meanings. They will be extracted to a
generated PyDL file and evaluated there (along with hand-written
declarations in the PyDL file). In the meantime, there is a backwards
compatible syntax explained later. 

    "typesafe":
    ===========
In addition to decl and typedecl the keyword "typesafe" can be used to
indicate that a function or method uses types in such a way that each
operation can be checked at compile time and demonstrated not to call
any function or operation with the wrong types. 

The keyword precedes the function definition:

typesafe def foo( a, b ):
    ...

The typesafe keyword can also be used before a class definition. That
means that every method in the class is declared to be type safe. 

There typesafe keyword can be used with the "module" modifier before
the first function or class definitions in a module to state that all
of the functions and classes in the module are type safe:

import spam
import rabbit
import orphanage

typesafe module 

An interface checker's job is to ensure that methods that claim to be
typesafe actually are. It must report and refuse to compile modules
that misuse the keyword and may not refuse to compile modules that do
not. The interface checker may optionally warn the programmer about
other suspect constructs in Python code.

Note: typesafe is the only change to class definitions or module
definitions syntax.


    "as"
    ====
The "as" operator takes an expression and an interface expression and
verifies at runtime that the expression evaluates to an object that
conforms to the interface described by the expression.

It returns the expression's value if it succeeds and raises
TypeAssertionError (a subtype of AssertionError) otherwise.

foostr  = foo as [String] # verifies that foo is a string and
                          # re-assigns it.

This operation can be used in various ways. The most basic way to use
it is as a test:

>>> j = getData()
>>> j as Int
>>> j=j+1

The "as" operator has the lowest precedence of the binary operators.

    Interface objects
    =================

Every interface object (remember, interfaces are just Python objects!)
has the following method :

__conforms__ : def (obj: Any ) -> boolean

This method can be used at runtime to determine whether an object
conforms to the interface. It would check the signature for sure but
might also check the actual values of particular attributes.

There is also a global function with this signature:

class_conforms : def ( obj as Class, Obj as Interface ) -> boolean

This function can be used either at compile time (e.g. by an
implementation of an interface checker) or runtime to check that a
class will generate objects that have the right signature to conform
to the interface.

(the rest of the interface reflection API will be worked out later)


Experimental syntax:
====================

There is a backwards compatible syntax for embedding declarations in a
Python 1.5x file:

"decl","myint as Integer"

"typedef","PositiveInteger as BoundedInt( 0, maxint )"

"typesafe"
def ...( ... ): ...

"typesafe module"

There will be a tool that extracts these declarations from a Python
file to generate a .gpydl (for "generated PyDL") file. These files are
used alongside hand-crafted PyDL files. The "effective interface" of
the file is evaluated by combining the declarations from the same file
as if they were concatenated together (more or less...exact details to
follow). The two files must not contradict each other, just as
declarations within a single file must not contradict each other.
This means that names that are declared twice must evaluate to
equivalent types.

Over time the .gpydl generator will get more intelligent and may
deduce type information based on code outside of explicit declarations
(for instance function and class definitions, assignment statements
and so forth).

The "as" keyword is replaced in the backwards-compatible syntax with 

Summary of Major Runtime Implications:
======================================
All of the named interfaces defined in a PyDL file are available in
the "__interfaces__" dictionary that is searched between the module
dictionary and the built-in dictionary.

The runtime should not allow an assignment or function call to violate
the declarations in the PyDL file. In an "optimized speed mode" those
checks would be disabled. In non-optimized mode, these assignments
would generate an IncompatibleAssignmentError.

The runtime should not allow a read from an unassigned attribute. It
should raise NotAssignedError if it detects this at runtime instead of
at compile time.

Several new object interfaces and functions are needed.

Future Directions:
==================

    Inferencing/Deduction:
    ======================
At some point in the future, PyDL files will likely be generated from
source code using a combination of declarations within Python code and
some sorts of interface deduction and inferencing based on
various kinds of assignment.

    Const-ness/Readonly-ness:
    =========================
We need to be able to say that some attributes cannot be re-bound and
that some attributes and parameters are immutable.

    Idea: The Undefined Object:
    ===========================

The Undefined object is used as the value of unassigned attributes and
the return value of functions that do not return a value. It may not
be bound to a name. 

a = Undefined   # raises UndefinedValueError
a = b           # raises UndefinedValueError if b has not been assigned

Undefined can be thought of as a subtype of NameError. Undefined is
needed because it is now possible to declare names at compile time but
never get around to assigning to them. In ordinary Python this is not
possible.

The only useful thing you can do with Undefined is check whether an
object "is" Undefined:

if a is Undefined:
    doSomethingWithA(a)
else:
    doSomethingElse()

This is equivalent to:

try:
    doSomethingWithA( a )
except NameError:
    doSomethingElse

It is debatable whether we still need NameError for anything other
than backwards compatibility. We could say that any referenced
variable is automatically initialized to "Undefined". Undefined is
sufficiently restrictive that this will not lead to buggy programs.

Undefined also corrects a long-term unsafe issue with functions. Now,
functions that do not explicitly return a value return Undefined
instead of None. That means that this is no longer possible

a = list.sort()

With Undefined, it will blow up because it is not possible to assign the
Undefined value. Before Undefined, the code did not blow up but it
also did not do the "right thing." It assigned None to "a" which was
seldom what was intended.