[Types-sig] PyDL RFC 0.02

Paul Prescod paul@prescod.net
Mon, 27 Dec 1999 06:02:04 -0500


PyDL RFC 0.02

A PyDL file declares the interface for a Python module. PyDL files
declare interfaces, objects and the required interfaces of objects.

At some point in the future, PyDL files will likely be generated from
source code using a combination of declarations within Python code and
some sorts of interface deduction and inferencing based on the contents
of
those files. For version 1, however, PyDL files are separate although
they do have some implications for the Python runtime.

This document describes the behavior of a class of software modules
called "static interface interpreters" and "static interface
checkers". Interface interpreters are run as part of the regular
Python module interpetation process. They read PyDL files and make the
interface objects available to the Python interpreter. Interface
checkers
read PyDL files and Python code to verify conformance of the code
to the interface.

Interfaces:
===========
Interfaces are the central concept in PyDL file. Interfaces are Python
objects like anything else but they are created by the interface
interpreter available to the static interface checker before Python
interpretation begins. The PyDL file itself generates an interface 
object that describes the attributes of the module. It may also
contain interface definitions for class instances and other objects.

These other interfaces can be created through interface definitions
and typedefs.  There may also be facilities for creating interfaces at
runtime but they are neither available to nor relevant to the
interface interpreter.

Interface definitions are similar to Python class definitions. They
use the keyword "interface" instead of the keyword "class". 

Sometimes an interface can be specialized for working with specific
other interfaces. For instance a list could be specialized for working
with integers. We call this "parameterization". An interface with
unresolved parameter variables is said to be "parameterizable". A type
with some resolved parameter variables is said to be "partially
resolved." A type with all parameter variables resolved is said to be
"fully resolved."

Typedefs allow us to give names to partially or fully resolved 
instantiations of interfaces.

In addition to defining interfaces, it is possible to declare other
attributes of the module. Each declaration associates an interface
with the name of the attribute. Values associated with the name in the
module namespace must never violate the declaration. Furthermore, by
the time the module has been imported each name must have an
associated value.

Behavior:
=========

The Python interpreter invokes the static interface interpreter and
optionally the interface checker on a Python file and its associated
PyDL file.  Typically a PyDL file is associated with a Python file
through placement in the same path with the same base name and a
".pydl"  or ".gpydl" extension. If both are avaiable, the module'sj
interface is created by combining the declarations in the ".pydl" and
".gpydl" files.

"Non-standard" importer modules may find PyDL files using other
mechanisms such as through a look-up in an relational database, just
as they find modules themselves using non-standard mechanisms.

The interface interpreter reads the PyDL file and builds the
relevant interface objects. If the PyDL file refers to other modules
then the interface interpreter can read the PyDL files associated
with those other modules. The interface interpreter maintains its own
module dictionary so that it does not import the same module twice.

The Python interpreter can optionally invoke the interface checker
after the interface interpreter has built interface objects and before
it interprets the Python module.

Once it interprets the Python code, the interface objects are
available to the runtime code through a special namespace called the
"interface namespace". This namespace is interposed in the name search
order between the module's namespace and the built-in namespace.

Interface expression language:
==============================

Interface expressions are used to declare that attributes must conform
to certain interfaces.  In a interface expression you may:

1. refer to a "dotted name" (local name or name in the PyDL of an
imported module ).

2. make a union of two or more interfaces:

integer or float or complex

3. parameterize a interface:

Array( Integer, 50 )
Array( length=50, elements=Integer )

Note that the arguments can be either interfaces or simple Python
expressions. A "simple" Python expression is an expression that does
not involve a function call.

4. use a syntactic shortcut:

[Foo] => Sequence( Foo ) # sequence of Foo's
{A:B} => Mapping( A, B ) # Mapping from A's to B's
(A,B,C) => Record( A, B, C ) # 3-element sequence of interface a,
followed
                             # by b followed by c

5. Declare un-modifiability:

const [const Array( Integer )]

(the semantics of un-modifiability need to be worked out)

Declarations in a PyDL file:
============================

(formal grammar to follow)

 1. Imports

An import statement in an interface file loads another interface file.
The import statement works just like Python's except that it loads the
PyDL file found with the referenced module, not the module itself. (of
course we will make this definition more formal in the future)

 2. Basic attribute interface declarations:

decl myint as Integer                   # basic 
decl intarr as Array( Integer, 50 )     # parameterized
decl intarr2 as Array( size = 40, elements = Integer ) # using keyword
syntax

Attribute declarations are not parameteriable. Furthermore, they must
resolve to fully parameterized (not parameterizable!) interfaces.

So this is allowed:

class (_X,_Y) spam( A, B ):
    decl someInstanceMember as _X
    decl someOtherMember as Array( _X, 50 )

    ....

These are NOT allowed:

decl someModuleMember(_X) as Array( _X, 50 )
class (_Y) spam( A, B ):
    decl someInstanceMember(_X) as Array( _X, 50 ) 

Because that would allow you to create a "spam" without getting around
to saying what _X is for that spam's someInstanceMember. That strikes
me as overly dynamic for a static type-check system (at least for
version 1).

 3. Callable object interface declarations:

Functions are the most common sort of callable object but class
instances can also be callable. Callables may be runtime parameterized
and/or interface parameterized.  For instance, there might be a method
"add" that takes two numbers of the same interface and returns a number
of
that interface.

decl Add(_X: Number) as def( a: const _X, b: const _X )-> _X

_X is the interface parameter. a and b are the runtime parameters.

 4. Class Declarations

A class is a callable object that can be subclassed.  Currently the
only way to make those (short of magic) is with a class declaration,
but one could imagine that there might someday be an __subclass__
magic method that would allow any old object instance to also stand in
as a class.

Here is the syntax for a class definition:

decl TreeNode(_X: Number) as 
        class( a: _X, Right: TreeNode( _X ) or None,
                    Left: TreeNode( _X ) or None )
                -> ParentClasses, Interfaces

What we are really defining is the constructor. The signature of the
created object can be described in an interface declaration.

 5. Interface declarations:

interface (_X,_Y) spam( a, b ):
    decl somemember as _X
    decl someOtherMember as _Y
    decl const someClassAttr as [ _X ]

    decl const someFunction as def( a: Integer, b: Float ) -> String

 6. Typedefs:

Typedefs allow interfaces to be renamed and for parameterized
variations of interfaces to be given names.

typedef PositiveInteger as BoundedInt( 0, maxint )
typedef NegativeInteger as BoundedInt( max=-1, min=minint )
typedef NullableInteger as Integer or None
typedef Dictionary(_Y) as {String:_Y}

The Undefined Object:
=====================

The Undefined object is used as the value of unassigned attributes and
the return value of functions that do not return a value. It may not
be bound to a name. 

a = Undefined   # raises UndefinedValueError
a = b           # raises UndefinedValueError if b has not been assigned

Undefined can be thought of as a subtype of NameError. Undefined is
needed because it is now possible to declare names at compile time but
never get around to assigning to them. In ordinary Python this is not
possible.

The only useful thing you can do with Undefined is check whether an
object "is" Undefined:

if a is Undefined:
    doSomethingWithA(a)
else:
    doSomethingElse()

This is equivalent to:

try:
    doSomethingWithA( a )
except NameError:
    doSomethingElse

It is debatable whether we still need NameError for anything other
than backwards compatibility. We could say that any referenced
variable is automatically initialized to "Undefined". Undefined is
sufficiently restrictive that this will not lead to buggy programs.

Undefined also corrects a long-term unsafe issue with functions. Now,
functions that do not explicitly return a value return Undefined
instead of None. That means that this is no longer possible

a = list.sort()

With Undefined, it will blow up because it is not possible to assign the
Undefined value. Before Undefined, the code did not blow up but it
also did not do the "right thing." It assigned None to "a" which was
seldom what was intended.

New Runtime Functions:
======================

conforms( x: Any, y: Interface ) -> Any or Undefined

This function can be used in various ways. The most basic way to use
it is as a test:

if conforms( j, Integer ) is Undefined:
    anint = conforms( j, Integer )

Because of the behavior of Undefined, it can also be used as an
assertion:

j = conforms( j, Integer )

which is equivalent to:

if isinstance( j, Integer ):
    raise UndefinedValueError

Every interface object (remember, interfaces are just Python objects!)
has the following method :

__conforms__ : def (obj: Any ) -> boolean

This method can be used at runtime to determine whether an object
conforms to the interface. It would check the signature for sure but
might also check the actual values of particular attributes.

There is also a global function with this signature:

class_conforms : def ( obj: Class, Obj: Interface ) -> boolean

This function can be used either at compile time (e.g. by an
implementation of an interface checker) or runtime to check that a
class will generate objects that have the right signature to conform
to the interface.

(the rest of the interface reflection API will be worked out later)

Experimental syntax:
====================

There is a backwards compatible syntax for embedding declarations in a
Python 1.5x file:

"decl","myint as Integer"
"typedef","PositiveInteger as BoundedInt( 0, maxint )"

There will be a tool that extracts these declarations from a Python
file to generate a .gpydl (for "generated PyDL") file. These files are
used alongside hand-crafted PyDL files. The "effective interface" of
the file is evaluated by combining the declarations from the same file
as if they were concatenated together (more or less...exact details to
follow). The two files must not contradict each other, just as
declarations within a single file must not contradict each other.

Over time the .gpydl generator will get more intelligent and may
deduce type information based on code outside of explicit declarations
(for instance function and class definitions, assignment statements
and so forth).

Summary of Major Runtime Implications:
=====================

All of the named interfaces defined in a PyDL file are available in the
"interfaces" dictionary that is searched between the module dictionary
and
the built-in dictionary.

The runtime should not allow an assignment or function call to violate
the declarations in the PyDL file. In an "optimized speed mode" those
checks would be disabled.

Several new object interfaces and functions are needed.

The new "Undefined" object is needed and assignments need to check for
"Undefined".