[Types-sig] feedback: PyDL RFC 0.4

Greg Stein gstein@lyra.org
Sun, 2 Jan 2000 18:07:09 -0800 (PST)


Time for my swing at this... :-)

On Thu, 30 Dec 1999, Paul Prescod wrote:
>...
> Interfaces are either complete or incomplete. An incomplete interface
> takes parameters and a complete interface does not. It is not possible
> to create Python objects that conform to incomplete interfaces. They
> are just a reuse mechanism analogous to functions in Python.  An
> example of an incomplete interface would be "Sequence". It is
> incomplete because we need to define the interface of the contents of
> the sequence.  

Wouldn't these be called "abstract interfaces" or "parameterized
intefaces"? That seems to be a more standard terminology.

> In an interface expression the programmer can provide parameters to
> generate a new interface. 

Maybe abstract vs concrete interfaces?

> Typedefs allow us to give names to complete or incomplete interfaces
> described by interface expressions. Typedefs are an interface
> expression re-use mechanism.

typedefs are also used to assign names to things like "Int or String".

I don't see "Int" as an interface (even though it probably is in *theory*,
it doesn't seem that way in layman usage).

>...
> The Python compiler invokes the static interface interpreter and
> optionally the interface checker on a Python file and its associated
> PyDL file.  Typically a PyDL file is associated with a Python file
> through placement in the same path with the same base name and a
> ".pydl"  or ".gpydl" extension. If both are avaiable, the module's
> interface is created by combining the declarations in the ".pydl" and
> ".gpydl" files.

These were to be '.pyi' or '.pi' files.

And I still don't understand the need to specify that *two* files exist.
Why are we going to look for two files? Isn't one sufficient?

>...
> Once it interprets the Python code, the interface objects are
> available to the runtime code through a special namespace called the
> "interface namespace". There is one such namespace per module. It is
> accessible from the module's namespace via the name "__interfaces__". 

I think interfaces have names just like any other object. The interfaces
are found in whatever context the definition appeared in: a module, a
class, or a local namespace.

---- a.py ----
interface foo1:
  ...

class Bar:
  interface foo2:
    ...
  ...

def Baz():
  interface foo3:
    ...
  ...
--------------

In the above example, we have three interface objects. One is available
via the name "foo1" in the module-level namespace. One is available as
"Bar.foo2" (via the class' namespace, and the class is in the module
namespace). The third, foo2, is only available within the function Baz().

I do not believe there is a need to place the interfaces into a distinct
namespace. I'd be happy to hear one (besides forward-refs, which can be
handled by an incomplete interface definition).

> This namespace is interposed in the name search order between the
> module's namespace and the built-in namespace.

Again: I don't think we want to search this namespace.

> Built-in Interfaces:
> ====================
> 
> Any
>     Number
>         Integral
>             Int
>             Long
>         Float
>         Complex
>     Sequence
>         String
>         Record
>     Mapping
>     Modules
>     Callable
>         Class
>         Function
>         Methods
>             UnboundMethods
>             BoundMethods    
>     Null
>     File

What does "builtin" mean? That these interfaces are magically predefined
somewhere and available anywhere?

Note: you probably want to remove the plural from "Modules", "Methods",
and *Methods.

What is the "Null" interface? Is that supposed to be None's interface? I
don't believe that we need a name for None's interface, do we? And why
introduce a name like "Null"? That doesn't seem very descriptive of the
interface; something like NoneInterface might be better.

> Certain interfaces may have only one implementation. These "primitive"
> types 
> are Int, Long, Float, String, UnboundMethods, BoundMethods, Module,
> Function 
> and Null. Over time this list may get shorter as the Python
> implementation is generalized to work mostly by interfaces.

I don't understand what you're saying here. This paragraph doesn't seem to
be relevant.

> Note: In rare cases it may be necessary to create new primitive
> types with only a single implementation (such as "window handle" or
> "file handle"). This is the case when the object's actual bit-pattern
> is more important than its interface.

Huh?

> Note: The Python interface graph may not always be a tree. For
> instance there might someday be a type that is both a mapping and a
> sequence.

In the above statement, you're mixing up implementations (which can use
disjoint interfaces) with the interface hierarchy. Or by "type" are you
referring to a new interface which combines a couple interfaces?

Note that I think it is quite valid to state that interfaces must always
be a tree, although I don't see any reason to avoid multiple-inheritance.

>...
> Interface expression language:
> ==============================

These are normally called "type declarators". I would suggest using
standard terminology here.

>...
> 1. refer to an interface by name. The name can either be simple or
> it may be of the form "module.interfacename" where "interfacename"
> is a name in one of two PyDL files for the named module.

Just use the "dotted_name" construct here -- that is well-defined by the
Python grammar already. It also provides for things like
"os.path.SomeInterface".

Note that interfaces do *not* have to occur in a PyDL module. Leave the
spec open for a combined syntax -- we shouldn't be required to declare all
interfaces in separate files.

>...
> 2. make a union of two or more interfaces:
>...
> Two union expressions X and Y are equivalent if their lists are the
> same length and each element in X has an equivalent in Y and vice
> versa.

IntOrString = typedef Int or String
IntOrStringOrTuple = typedef Int Or String or Tuple
IST2 = typedef IntOrString or Tuple

assert IntOrStringOrTuple == IST2

In other words, the lengths do not have to be equal. A precondition is
that all union typedecls must be "flattened" to remove other unions. The
resulting, flattened list must then follow your equivalency algorithm.

> 3. parameterize a interface:
> 
> Array( Int, 50 )
> Array( length=50, elements=Int )
> 
> Note that the arguments can be either interface expressions or simple
> Python expressions. A "simple" Python expression is an expression that
> does not involve a function call or variable reference.

I disagree with the notion of expressions for the parameter values. I
think our only form of parameterization is with typedecl objects. The type
checker is only going to be dealing with type information -- expression
values as part of an interface don't make sense at compile time.

I think each parameter will be a type declarator.

>...
> 4. use a syntactic shortcut:
> 
> [Foo] => Sequence( Foo ) # sequence of Foo's
> {String:Int} => Mapping( String, Int ) # Mapping from A's to B's
> (A,B,C) => Record( A, B, C ) # 3-element sequence of interface a,
> followed
>                              # by b followed by c

Case is significant; your example and comment do no match.

>...
> 5. generate a callable interface:
> 
> def( Arg1 as Type1, Arg2 as Type2 ) -> ReturnType

Colons, please. "as" has the wrong semantic.

> The argument name may be elided:
> 
> def( Int, String ) -> None
> 
> Note: this is provided for compatibiity with libraries and tools that
> may not support named arguments.

I agree. The return type should also be optional. Note that we can't allow
just a name (and no type), as that would be ambiguous with just a type
name.

>...
> It is possible to declare variable length argument lists. They must
> always be declared as sequences but the element interface may vary.
> 
> def( Arg1 as String, * as [Int] ) -> Int 
>             # callable taking a String, and some un-named Int
>             # arguments
> 
> Finally, it is possible to declare keyword argument lists. They must
> always be declared as mappings from string to some interface.

I think that I agree with the "string to <foo>" argument, but it is
interesting to note:

>>> def foo(**kw):
...   print kw
...
>>> apply(foo,(),{0:1})
{0: 1}
>>>

:-)

>...
> Note that at this point in time, every Python callable returns
> something, even if it is None. The return value can be named,
> merely as documentation:
> 
> def( Arg1 as Int , ** as {String: Int}) - > ReturnCode as Int

Ack! ... no, I do not think we should allow names in there. Return values
are never named and would never be used. Parameters actually have names,
which the values are bound to. A return value name also introduces a minor
problem in the grammar (is the name a name for the return value or a type
name?).

>...
>  2. Basic attribute interface declarations:
> 
> decl myint as Int                   # basic 
> decl intarr as Array( Int, 50 )     # parameterized
> decl intarr2 as Array( size = 40, elements = Int ) # using keyword
> syntax

"as" does make sense in this context, but I'd use colons for consistency.

> Attribute declarations are not parameteriable. Furthermore, they must
> resolve to complete interfaces.

Agreed.

> So this is allowed:
> 
> class (_X,_Y) spam( A, B ):
>     decl someInstanceMember as _X
>     decl someOtherMember as Array( _X, 50 )
> 
>     ....

You haven't introduced this syntax before. Is this a class definition? I
presume this is also intended to create a parameterizable typedecl object
for the class? e.g. where we could do:

def somefunc(x: spam(Int,String)):
  ...

> These are NOT allowed:
> 
> decl someModuleMember(_X) as Array( _X, 50 )

Reason: modules are not parameterizable.

However: I think modules should be able to conform to an interface. And
since an interface can be parameterized, then this means that a module can
be parameterized. This is analogous to parameterizing a class.

> class (_Y) spam( A, B ):
>     decl someInstanceMember(_X) as Array( _X, 50 ) 
> 
> Because that would allow you to create a "spam" without getting around
> to saying what _X is for that spam's someInstanceMember. That would
> disallow static type checking.

Agreed. The _X must occur in the class declaration statement.

>...
> It is possible to allow _X to vary to some extent but still require it
> to always be a Number:
> 
> decl Add(_X as Number) as def( a as _X, b as _X )-> _X

Note that this implies the concept of hierarchy among the interfaces.

Either that, or you need to define a way to show that _X implies the
Number interface because it has the same members and each member conforms
to the equivalent Number member. Note that you will then have to define a
rule for whether "decl x as Int" is the "same" as "decl x as Number". For
conformance, is the first too specific, or is it just a more concrete form
of the latter? (but still allowed)

>...
>  4. Class Declarations
> 
> A class is a callable object that can be subclassed.  Currently the
> only way to make those (short of magic) is with a class declaration,

Proper terminology is "... with a class definition, ..."

> but one could imagine that there might someday be an __subclass__
> magic method that would allow any old object instance to also stand in
> as a class.

eh? what is this doing in here?

> The syntax for a class definition is identical to that for a function
> with the keyword "def" replaced by "class".  What we are really
> defining is the constructor. The signature of the created object can
> be described in an interface declaration.

Ick. We don't need anything special for this. The constructor is given by
the __init__ that occurs in the interface.

> decl TreeNode(_X) as class( 
>             a as _X, 
>             Right as TreeNode( _X ) or None,
>             Left as TreeNode( _X ) or None )
>                 -> ParentClasses, Interfaces

This would be:

  class (_X) TreeNode(ParentClasses):
    __interfaces__ = Interfaces
    def __init__(self, a: _X,
                 Right: TreeNode(_X) or None,
                 Left: TreeNode(_X) or None):
      ...

If you're just trying to create the notion of a factory, then "def" is
appropriate:

  decl TreeNode(_X): def(a: _X,
                         Right: TreeNode(_X) or None,
                         Left: TreeNode(_X) or None)    \
                       -> (ParentClasses or Interfaces)

  IntTree = typedef TreeNode(Int)

Note that parens are needed on the return type so that the "or" binds
properly.

>...
>  6. Typedefs:
> 
> Typedefs allow interfaces to be renamed and for parameterized
> variations of interfaces to be given names.
> 
> typedef PositiveInt as BoundedInt( 0, maxint )
> typedef NegativeInt as BoundedInt( max=-1, min=minint )
> typedef NullableInt as Int or None
> typedef Dictionary(_Y) as {String:_Y}

These should be assignments and use a unary operator. The operator is much
more flexible:

  print_typedecl_object(typedef Int or String)

Can't do that with a typedef or decl *statement*.

Also note that your BoundedInt example is a *runtime* parameterization.
The type checker can't do anything about:

  decl x: PositiveInt
  x = -1

But we *can* check something like this:

  def foo(x: NegativeInt):
    ...
  decl y: PositiveInt
  y = 5
  foo(y)

But this latter case is more along the lines of naming a particular type
of Int. The syntax could very well be something like:

  decl PositiveInt: subtype Int
  decl NegativeInt: subtype Int

The type-checker would know that PositiveInt is related somehow to Int
(and it would have to issue warnings when mixed). It would also view
PositiveInt and NegativeInt as different (thereby creating the capability
for the warning in the foo(y) example above).

Anyhow... as I mentioned above, we should only be allowing typedecl
parameters. We can't type-check value-based parameters.

If you want to introduce a type name for a runtime type-enforcement (a
valid concept! such as your PositiveInt thing), then we should allow full
expressions and other kinds of fun in the parameter expressions (since the
runtime type should be createable with anything it wants; we've already
given up all hope on it). But then we get into problems trying to
distinguish between a type declarator and an expression. For example:

  MyType = typedef ParamType(0, Int or String)

In this example, the first is an expression, but the second should be a
type declarator. Figuring out which is which is tricky for the parser.

As a consequence: I would recommend *only* allowing for type declarators
and skipping the notion of type-checking with runtime types.

Introducing the "subtype" thing that I blue-skied is possible, but I'd
punt on that, too. It seems a bit too specialized and probably applicable
with just a few types.

> New Module Syntax:
> ======================
> In a future version of Python, declarations will be allowed in Python
> code and will have the same meanings. They will be extracted to a
> generated PyDL file and evaluated there (along with hand-written
> declarations in the PyDL file).

I disagree that they will always be extracted into a separate PyDL file.
As an optimization: sure, we could do this. Effectively like caching a
module's bytecodes in a .pyc file. But I don't think you should codify
that here.

>...
>     "typesafe":
>     ===========
> In addition to decl and typedecl the keyword "typesafe" can be used to
> indicate that a function or method uses types in such a way that each
> operation can be checked at compile time and demonstrated not to call
> any function or operation with the wrong types. 

What about the problem of non-existence? How "safe" is "typesafe"? And how
is this different from regular type checking?

>...
> An interface checker's job is to ensure that methods that claim to be
> typesafe actually are. It must report and refuse to compile modules
> that misuse the keyword and may not refuse to compile modules that do
> not.

That last sentence is awkward. Can you rephrase/split/etc?

> The interface checker may optionally warn the programmer about
> other suspect constructs in Python code.
> 
> Note: typesafe is the only change to class definitions or module
> definitions syntax.

Class definitions also have the parameterization syntax change:

  class (_X) Foo(Super):
    decl node: _X
    ...

Class and modules should also have a syntax for specifying the
interface(s) they conform to. I don't think this requires a syntax change,
though, as I would recommend assigning the interfaces to an __interfaces__
attribute.

[ note: JimF's Scarecrow proposal mentions __interfaces__ but seems to
  actually use __implements__ ]

>     "as"
>     ====
> The "as" operator takes an expression and an interface expression and
> verifies at runtime that the expression evaluates to an object that
> conforms to the interface described by the expression.

"as" is the wrong semantic (it implies you want to use/coerce the value to
a specific type, which is impossible). Use "!" or "isa".

> 
> It returns the expression's value if it succeeds and raises
> TypeAssertionError (a subtype of AssertionError) otherwise.
> 
> foostr  = foo as [String] # verifies that foo is a string and
>                           # re-assigns it.

Don't you mean "list of string", or should you drop the brackets?

>...
>     Interface objects
>     =================
> 
> Every interface object (remember, interfaces are just Python objects!)
> has the following method :
> 
> __conforms__ : def (obj: Any ) -> boolean

Just call it "conforms". There is no need to "hide" this method since the
interface does not expose interface members as its *own* members.

> This method can be used at runtime to determine whether an object
> conforms to the interface. It would check the signature for sure but
> might also check the actual values of particular attributes.

I think that you would want a version that just checks an objects
__interfaces__ attribute (quick), and a different method that does an
exhaustive check of the object's apparent interface against the specified
interface.

>...
> Experimental syntax:
> ====================
> 
> There is a backwards compatible syntax for embedding declarations in a
> Python 1.5x file:
> 
> "decl","myint as Integer"

Just use a single string. The parse tree actually gets even uglier if you
put that comma in there :-). We can pull the "decl" out just as easily if
it is the first part of a "decl myint: Integer".

> "typedef","PositiveInteger as BoundedInt( 0, maxint )"

Since I think this would be a unary operator, I'd recommend a transitional
syntax of:

  typedef("TreeNode(Int)")

i.e. we parse the string, but it also calls a runtime function to
construct the typedecl object.

The issue here is that we want to inject certain names into the namespace.
"typedef" as a "statement string" will be ignored by the interpreter. As a
function with a result, which then gets assigned, we get the right
semantic in both compile- and run- time cases.

> "typesafe"
> def ...( ... ): ...
> 
> "typesafe module"

No problem.

We do have an issue with adding parameters to a class definition.

> There will be a tool that extracts these declarations from a Python
> file to generate a .gpydl (for "generated PyDL") file.

Why pull them out? Leave them in the file and use them there. No need to
replicate the stuff to somewhere else (and then try to deal with the
resulting synchronization issues).

>...
> The "as" keyword is replaced in the backwards-compatible syntax with 

You didn't put anything here :-)

I'd say the temporary syntax could be another function call:

  assert_type(x, "Int or String")

This would work properly at compile- and run- time.

> Summary of Major Runtime Implications:
> ======================================
> All of the named interfaces defined in a PyDL file are available in
> the "__interfaces__" dictionary that is searched between the module
> dictionary and the built-in dictionary.

__interfaces__ for the module should specify the module's interface
conformace. The actual interface objects would just be other names in the
module's namespace.

> The runtime should not allow an assignment or function call to violate
> the declarations in the PyDL file. In an "optimized speed mode" those
> checks would be disabled. In non-optimized mode, these assignments
> would generate an IncompatibleAssignmentError.

This is a difficult requirement for the runtime. I would suggest moving
this to a V2 requirement.

> The runtime should not allow a read from an unassigned attribute. It
> should raise NotAssignedError if it detects this at runtime instead of
> at compile time.

Huh? We already have a definition for this. It raises NameError or
AttributeError. Please don't redefine this behavior.

>...
>     Idea: The Undefined Object:
>     ===========================

You haven't addressed any of my concerns with this object. Even though
you've listed it under the "future" section, I think you're still going to
have some serious [implementation] problems with this concept.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/