[Types-sig] A late entry

Tue, 14 Mar 2000 17:54:52 -0500

Whoops.  For me, this has always been an area of interest, but I
completely missed the fact that people were actually discussing stuff
here now, until the DC PIGgies meeting last night.  Here are my
comments so far on what I've seen on the type sig.  Unfortunately, I
haven't read much of what's been said, so I'm sorry if some things
have been discussed or are not appropriate.  I've tried to at least
read or skim the proposals on the list web page.

Here I'm going to mainly focus on Guido's proposal, since it seems to
be one of the more recent proposals, incorporating select ideas from
other proposals.

Let's start with some notes on terminology.  I've seen the word
"typesafe" thrown around quite a bit in a short period of time.  Let's
try to avoid using the word casually here, because it can lead to
confusion. I find people often don't know what they're refering to
when they use (or hear) the term.  Does it mean that no type errors
can ever happen at run time?  At all?  Or that they can only happen
under specific conditions?  If so, what are those conditions?
Cardelli would probably say that "safe" means that the error won't
cause a crash or go unnoticed at runtime (not everyone agrees with his
definitions).  By his definition, Python is already a type safe
language.  However, I often hear people who use the term to mean
static type safety... i.e., no type error will happen at run time.
There's essentially no practical hope for that in an object oriented
language like python.

I think it might be nice for us to agree to share a common
terminology.  I know there's a bit of dissention w/ Cardelli's
prefered terminology, but it at least provides us with a common
reference in a field where terminology tends to be very muddled.  I
would recommend we all read Section 1 of Cardelli's "Type Systems"
(it's about 9 pages). I have copies of this paper in ps and pdf:

http://www.list.org/~viega/cs655/TypeSystems.ps
http://www.list.org/~viega/cs655/TypeSystems.pdf

Oh, one thing that I should mention is that throughout this document I
assume an inline syntax for type checking.  I far prefer it.

Okay, next I want to talk about the way that "checked" and "unchecked"
modules interact in Guido's proposal.  He says that when an checked
module imports and uses an unchecked module, all the objects in the
unchecked module are assigned the type 'any' for the purposes of the
checking.  I am not 100% sure that's the best idea in the face of type
inference.  The checked module deserves type information that's as
specific as possible, so that better types can be infered for the
checked program.  Of course, in many situations a type inference
engine isn't going to be able to infer much more than "any" when there
are absolutely no type hints in the code.  But if something can be
done, why not?

The flip side of the coin is that after the type checking occurs, the
unchecked code can exhibit bad behavior, breaking infered invariants.
For example, let's say some checked code calls foo.bar(), which is
infered to have the type: integer*integer->integer.  Someone can
dynamically load a module from unchecked code which replaces foo.bar()
with a function of type string->integer.

Options here?  Run-time checks can be added at points where this is
possible. I don't know if I support runtime checks if they can be
avoided. Another option is to say that "guarantees" made by the type
checker only hold if the whole program is checked.  I'm comfortable
with that (though I am still probably slightly in favor of the
addition of runtime checks), but I suspect that others will not be.  I
have the same opinion when it comes to using checked modules from
unchecked modules.  Guido proposes adding runtime checks.  I might
prefer not to have them.  Of course, one option is to support both
options...

I also might disagree with always performing type checking at load
time (when an unchecked module imports a checked module).  I see new
type checking features as a static convenience for people who want to
use it.  Are people going to want to pay a runtime hit when linking
against checked code when they don't want to check code themselves?  I
don't think so.

Arguing against myself again, you can also make a case that this sort
of dynamic checking is necessary to be able to live up to the promises
a static checker makes when it "passes" a piece of code.  Guido talks
about dynamically importing a module and checking to see if the module
matches the expected signature.  That's certainly true... here's a
really simple case: let's assume that A imports B, and that A+B were
fully checked statically, but someone replaced B with C after checking
and before runtime.  Is it worth performing those extra checks?  The
static checks have done the best they could, and now, if there are
gross incompatabilities, we're hopefully going to see an error
dynamically whether we added extra checks or not (you can certainly
construct cases where that's not true).  I'd probably say it is worth
performing those static checks in some less frequent, highly dynamic
situations, such as when you've got a completely checked application,
then you import a module that was essentially typed to stdin.

I think the right answer for Python depends on what the goals are.
I'm looking for something that will help me find bugs statically that
traditionally only cropped up at run time, when possible.  Getting
contraversial, I don't know if I really care about supporting those
dynamic checks that are only there to support cases in which the
application is used in ways that subverts the assumptions I made when
performing the type checking, such as what modules I'd be using in my
application.  It should be possible to turn all that stuff off, at the
very least, and just let everything go while completely ignoring type
information.  Remember, dynamic features in the type system can end up
slowing down a language...

Okay, going to the GCD example...  Do we really need to use the "decl"
keyword alone to indicate that a module has been checked?  I'm fine
with gcd(3.5,3) throwing a dynamic error if the module is not checked
(I'd also be fine with it trying to run here, honestly).  But should
it try to compute the result if I remove the "decl" keyword from the
file, but still specify types of the arguments to gcd??

Why does the word "decl" indicate the module is checked?  Just because
the programmer added the keyword doesn't mean the types he or she
assigned are even internally consistant.  Is the interpreter going to
recheck on startup every single time?  It seems like a lot of overhead
to me.  Plus, if the keyword isn't there, but there's some type
information, shouldn't we check the types?

If the type checker is a separate static program, it seems to me that
things are more natural.  We try to check the inputs to the program.
If those inputs use other modules, we go and check them, too.  If the
checking fails (say it's unchecked), then we warn, and can either fail
or (optionally?) make assumptions (e.g., assign any everywhere) and
continue.

I'm worried a bit about a type checked library breaking code that
doesn't use type checking.  For example, let's say that Guido's GCD
example is in a std library right now, and type checking gets added.
What if code passes in floats?  The unchecked code causes a runtime
exception in Guido's proposal.  Well, the code might not always be
wrong!  For example, let's go back to GCD.  While GCD would get typed
as integer*integer->integer, you can get perfectly valid results if
you pass in floats if your floats always happen to be integers.  Maybe
my code always calls GCD as so:

gcd(12.0,28.0)

Why should this code break?  Thats one more reason why I think that
checking maybe should only be performed "on demand" (dynamic checks
are okay if they are explicitly requested).

Next, let's talk briefly about what type inferencing can do.  In
particular, consider Guido's example where he assumes that the type
inference algorithm isn't sophisticated enough to handle complex
types, such as lists.  Let me say that lists are pretty easy to handle
when doing type inferencing.  Object types can get a tad bit tricky,
but everything should be doable.  When I have time, I'll ramble a bit
about what I see as the best approach to implementing a type inference
engine for Python based on well-known algorithms.

There's a problem though.  No one has ever been successful at coming
up with an OO language that successfully integrates all 3 of the
following:

1) Type inferencing
2) Subtyping
3) principal types

Currently, you can choose any two (See Jens Palsburg's brief note
"Type Inference For Objects", which I have at
http://www.list.org/~viega/cs655/ObjInference.pdf.  BTW, a principal
type summarizes all possible types of a given piece of code (variable,
function, etc).  I'd like to capture all 3.  Basically, 'a->'a is the
principal type of the "identity" function, even though there are
plenty of valid specific types that will work in a particular context.

I don't know if we can solve this problem in the general case. I don't
think it's possible for Python's needs, but I haven't really given it
too much thought yet.

A couple of brief thoughts at this point... What should the type
checker do with infered types?  Place them directly into the code?
Place them in comments or a doc string so that they have no effect but
you can see what the checker did and double-check its work?

Also, it might be nice to type code when you don't have access to the
source.  Should it be possible to specify the types of features in a
module for which you don't have source (assume it wasn't written with
checking code)?  If so, how?  An interface definition file?  Then do
you try to check the byte code to make sure the given interfaces are
actually valid?

Here's an issue I haven't seen brought up yet. Consider the following
function:

def identity(x):
  return x

What's the type of this function?  There are a couple ways to look at
this problem.  First, we can look at all contexts in which "identity"
is called, and have its type be the union of all of those types.  That
is a very ad hoc method of genericity.  If we call "identity" as such:

identity('foo')

And there are no other instances of this call, then we would assign
this function the type string->string.  If there's also a call:

identity(12)

Then would the type then become (string|integer)->(string|integer)?
That sounds like a bad idea.  At that point, the type system has lost
precision (I have a big problem with OR-ing of types, which has been
proposed... more below).  The signature implies you can pass in a
string and get back an integer, which means the type is too liberal.
Another reason this isn't a great idea is that you'd have to defer the
type until you've seen all calls.  Plus, as you add more code, and
call "identity" in different contexts, the type will grow.  That seems
unweildy, especially in documentation.

I think it would be best to be able to say, "the in type is the out
type, and who cares about the type beyond that".  Parametric
polymorphism to the rescue... the type could be <a>-><a>, where <a>
indicates that x can be of any type.  I think the <a> syntax could be
better (I prefer ML's, which types identity as 'a->'a, but I hear
there's some resistance to it in this community... I'm going to use 'a
from now on because it looks more natural to me).  One problem here is
that it probably isn't going to be possible to infer an accurate
principal polymorphic type for things in all cases (see above).  I'll
have to give the worst cases some thought.

You may have noticed that 'a looks a heck of a lot like the "any"
keyword proposed by others.  It turns out to be pretty much the same
thing, except parametrically polymorphic. One thing you can do is
differentiate between different types (basically buying you
parameterized functions, but with a much better syntax, IMHO.  Eg:

def init_dict(key : 'a, val : 'b):
  global dict : {'a:'b}
  dict[key] = val

The proposed alternative is something like:

def init_dict<a,b>(key: a, val : b):
  ...

There seems to be an implication that functions named init_dict will
need to be instantiated...  That's a kludgy C++-ism... instantiation
is not necessary for parametric polymorphism.

So should the proposed "any" keyword go away?  Unfortunately, no.  The
difference in semantics between "any" and generic types is that the
"any" keyword basically forces the program to forego type checks when
variables involving "any" are involved, which will sometimes be
necessary.

An example might clear up the distinction:

def f(x): # Here, x is infered to be of type 'a, which is currently
          # the principal type SO FAR.
  x.foo() # Whoops, we just had to narrow 'a to any object with a "foo"
          # method.  If x were of type "any", its type would stay the
same.

I really hope that people avoid using "any" *always*, but it does need
to be there, IMHO.

Another problem that falls out at this point is how to write the type
of x when we've seen x call "foo".  Do we look at all classes across
the application for ones with a foo method?  And do we care if that
solutions precluds classes that dynamically add their "foo" method
from applying?  Ick.  I'd prefer to avoid the concrete, and infer
something like:

<< foo: None->'a >>

Which would read: an object with a field foo of type function taking
no arguments and returning anything.

In the following case:

def f(x):
  decl i : integer
  x.foo()
  i = i + x.bar("xxx")
  z = x.blah(1,2)

We would infer the following type:

<<
   foo:  None->'a,
   bar:  string->integer,
   blah: integer*integer->'b
>>

Adding contrained polymorphic types in declarations should be
possible, even though it'd get messy without some sort of typedef
statement:

def add_observer(x : << notify: string->None >> ) -> None:
    global notify_list: << notify : string->None >>
    if not notify_list: notify_list = [x]
    else: notify_list.append(x)

It'd be nice to be able to do:

typedef notifyable << notify: string->None >>
def add_observer(x : notifyable) -> None:
    global notify_list: notifyable
    if not notify_list: notify_list = [x]
    else: notify_list.append(x)

Ok, back to the OR-ing of types.  The result of such a construct is
only going to be lots of ad hoc typecase stuff and types that are not
as precise as they should be (e.g., the
(string|integer)->(string|integer) example above).  For example,
consider the following code:

class a:
  def foo(self : a, x : integer): ...
class b:
  def bar(self : b): ...

def blah(x : a | b):
  x.foo(2)

That code shouldn't pass through the type checker, because there's no
guarantee that x has a method foo.  The only real solution every time
you have an OR type is a typecase statement, which is ad hoc and can
lead to maintainance problems.  I really don't think there should be a
language construct to support what's really bad programming
practice... I think that "any" should be the only place in the system
where types can be that ambiguous.  (To clarify... typecases are
sometimes necessary, but I think the OR-ing of types is a pretty bad
idea).

I do, however, support the AND-ing of types.  There's still a minor
issue here.  Consider the code:

class a:
  def foo(self: a, x: integer) -> integer: ...
class b:
  def bar(self: b) -> None: ...

def blah(x:a&b):
  x.foo()

This should definitely type check.  However, what are the requirements
we should impose on the variable x?  Must it inherit both a and b?  Or
need it only implement the same methods that a and b statically
define?  I prefer the former.  There's also the option of restricting
&'s to interface types only, which I think is fine.  BTW, if there
isn't an interface mechanism in the 1.0 version of the type system,
people will start defining "interfaces" as such:

class IWidget:
  def draw(self: IWidget) -> None: pass
  def getboundingbox(self: IWidget) -> (float*float)*(float*float): pass

That's okay, but you do want the type checker to be able to
distinguish between an abstract method and a concrete method.
Otherwise:

class Scrollbar(IWidget):
  pass

Would automatically be a correct implementation of the IWidget
interface, even though we failed to define the methods listed in that
interface (we should be forced to add them explicitly, even if their
body is just a "pass").  I'd much rather the above give an error.  I
think that special casing classes with no concrete implementations
isn't that good an idea, so an "interface" keyword should be
considered, which would look the same as classes without the method
bodies, and with the restriction that interfaces cannot inherit
classes (though it's desirable for classes to inherit interfaces,
obviously).  I wonder if it would be good to also pull out the
explicit self parameter.

Eg:

interface IWidget:
  def draw() -> None
  def getboundingbox() -> (float*float)*(float*float)

I think it would be good to allow parameter-based method overloading
for people who use the type system.  You'd be allowed to do stuff like:

class Formatter:
  def print(self: Formatter, i : integer)->None: ...
  def print(self: Formatter, s : string)->None:  ...
  def print(self: Formatter, l : ['a])->None: ...

It would be easy for the compiler to turn the above into something like:

class Formatter:
  def $print_integer(self, i)->None: ...
  def $print_string(self, s)->None: ...
  def $print_list_of_generic(self, l)->None: ...
  def print(self, x)->None:
    typecase x:
      case i: integer => self.$print_integer(i)
      case s: string  => self.$print_string(s)
      case l: ['a]    => self.$print_list_of_generic(l)
# The following should be implicit, but I'll list it explicitly...
      default    => raise TypeError

The $'s above are just some magic to prevent collisions... however
this is actually implemented is also not very important.  One
difficulty here is making sure the debugger handles things properly
(i.e., maps stuff back to the original code properly)...

The syntax of typecasing is not too important here. It could be the
casting syntax proposed elsewhere.  I prefer a real typecase statement
based on matching as above.  It's more powerful, and easier to read (I
don't like choosing arbitrary operators).  The problem is what exact
syntax to use so that you can show types, bind to variables and allow
for code blocks, while keeping the syntax as consistant with the rest
of the language and type system as possible.  The colon always
precedes a code block, but we're using the colon to separate a
variable from its type, too.  However, don't like:
case l : ['a] : self.$print_list_of_generic(l)
Perhaps:
case ['a] l : self.$print_list_of_generic(l)

Though that suddenly makes type decls very irregular.  Then there's
the option of not explicitly assigning to a new variable:

case ['a] : self.$print_list_of_generic(x)  # note the x

I think that last one gets my vote, currently.  We can definitely
figure out the type of x within that block.  If it needs to be used
outside the block, then people can copy it into a variable if they
want to preserve the cast:

decl l : ['a]
typecase x:
  case ['a] : l = x

By the way, note that the two "'a"'s in the above code can be
different and still work:

decl l : ['whatever]
typecase x:
  case ['a] : l = x  # This is okay... the types are compatible,
                     # but now there is an implicit equivolence of
                     # the 2 types.

To implement the same kind of matching better functional languages
provide, we'd need something that allowed for assigning to multiple
vars at once:

decl z : integer
typecase a:
  case (x : integer, y : integer) => z = x + y
  case (x : integer,)             => z = x
  case x : integer                => z = x

I'd like this type of matching, but it's got the too many colon syntax
problem, since the => is not pythonesque...

Hmm, what if all types were expressed using :- instead of :?  Yes, :-
is the assignment operator in one or two languages, but not too many
people have used those languages:

decl z :- integer
typecase a:
  case (x :- integer, y :- integer) : z = x + y
  case (x :- integer,)              : z = x
  case x :- integer                 : z = x

That's not quite as bad.

Now, on to variance of arguments.  Contravariance is definitely bad
IMHO.  Yes, it's a simpler model, and lends itself to type safety
better, but if you've ever programmed in Sather, you probably know
that it can get really inconvenient to do really simple
things. Invariance (aka nonvariance or novariance) is the approach
taken by C++ and Java.  It usually does pretty well, and is simple to
implement.  It usually does what you want, and doesn't lead to the
same runtime type problems covariance does.  However, consider a
situation like the following:

class Container: pass
class LinkedList(Container):
   def add(self, l :- LinkedList):
     ...

class BidirectionalLinkedList(LinkedList):
   def add(self, l :- ???):
     ...

(BTW, I'm going to drop the type of self from here on out, and assume
that it is always the type of the class in which the method is
defined.  I don't think there should be a type variable to allow for
talking about the type of self, such as in the __add__ example in
Guido's proposal.  Let covariance of parameters do the work... type
variables can lead to some pretty subtle problems.)

In the above example, what should the type of l be?  In a
contravariant language, as the class gets more specific, the
parameters must get more generic or stay the same.  Therefore,
implementing BidirectionalLinkedList requires an explicit type cast if
we want to enforce the natural restriction that you can only add
another bidirectional linked list on to the end of a bidirectional
linked list...

In a covariant language, "BidirectionalLinkedList" would be the right
answer, and that seems natual.  Choosing to keep your parameter
invariant is actually fine as well, so you can do:

class LinkedList:
  def merge(self, l :- LinkedList):
    ...

class BidirectionalLinkedList(LinkedList):
  def merge(self, l :- LinkedList):
    ...

(We don't care whether the parameter is bidirectional or not)
I'll get back to the problems with this approach in a second.

Invariance is an answer... the parameter would forever have to be
LinkedList in all subclasses, but really requires a typecase on the
parameter, or the clever use of a constrained generic type:

class LinkedList<T -> LinkedList>: # specifies that the parameter must
be
                                   # substitutable with LinkedList.
   def add(self, l :- T):
     ...
   def merge(self, l : LinkedList):
     ...

class BidirectionalLinkedList(LinkedList<BidirectionalLinkedList>):
   def add(self, l):
     ... # l is of type BidirectionalLinkedList here.
   def merge(self, l : LinkedList):
     ...

I think constrained generic types are good(tm) and should be added.
This was the approach I was leaning towards last night at the DC
PIGgies meeting, but I've changed my mind, as, there are some
reasonable problems with using them here.  If we have a lot of
parameters of different types that each need to vary, we have to use a
ton of constrained types.  It can get ugly quickly.  Plus, if we
wanted to add a feature to a base class that required a covariant
parameter, we'd have to add a new constrained parameter, which change
the interface of the class, potentially breaking a lot of code.  I
think this is bad, and perhaps worse than typecasting, which is
another solution when you only have invariance.

So what's wrong with covariance?  Here's a common example of the
problem:

class Person:
  def ShareRoom(self, p :- Person)->None:
    self.room = p.room

class Boy(Person):
  pass

class Girl(Person):
  pass

Hmm, the above isn't quite what we want, as the intent is probably to
keep boys from rooming with girls.  Let's assume we want to allow
invariant parameters.  We'd have to recode our Boy and Girl classes as
such:

class Boy(Person):
  def ShareRoom(self, p :- Boy)->None:
    Person.ShareRoom(self, p)

class Girl(Person):
  def ShareRoom(self, p :- Girl)->None:
    Person.ShareRoom(self, p)

The problem here is that we can now do the following:

decl p :- Person
decl g :- Girl
p = Boy()
g = Girl()
g.room = blah
p.ShareRoom(g)

The last call there appears type correct... a static type checker will
say "looks good!" because Person's ShareRoom accepts any person
object.  At run time, we will dispatch to the Boy's version of
ShareRoom, which narrows the type, and yields a type error (if we are
adding dynamic checks).

One solution is adding anchored types to the language, which is okay,
but the programmer could still write the above code, and it would
still be broken.  Anchored types essentially just gives the programmer
a way to perform the above and get an error on the p.ShareRoom(g) call
statically, but only if he added additional magic to one of his
classes.  Anchored types are cool, but I don't think this is the right
solution for the problem, so I won't cover them right now.

Meyer's prefered approach is to add a rule that says "polymorphic
catcalls are invalid".  What's a catcall?  CAT stands for "Changing
Availability or Type".  In the context of this discussion, a routine
that is a cat is a routine where the type of the arguments of a
function vary in a derived class(*).  A catcall is any call to a cat
method.  A polymorphic catcall is a call to a cat method where the
target object is polymorphic, which happens in at least 2 cases:

  1) The object appears in the LHS of an assignment, where the RHS is
     a subtype of the LHS.

  2) The object is a formal parameter to a method.

There might be a third case... I'll have to go look it up.  I don't
think there is any problem that would make the solution not
appropriate for Python, though.

It should be possible to implement this solution, and even do so
incrementally.  There's another (better, less pessimistic) solution
called the global validity approach.  The problem with it, IIRC, is
that the algorithm basically assumes that type checking goes on in a
"closed world" environment, where you're checking the entire system at
once.  That probably isn't desirable.  I wonder if there haven't been
refinements to this algorithm that allow it to work incrementally.
Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
rule, assuming that the catcall rule can actually work for Python.

BTW, return types should be covariant too.

Syntax for exceptions: If we add exceptions as part of a signature, I
don't think that I see a good reason to use anything other than
something similar to Java's "throws" syntax.  I'd add a comma before
the "throws" for clarity's sake. Here's a right-recursive (LL) grammar
fragment:

throws_clause: "," "raises" throws_list;
throws_list:   identifier more_throws;
more_throws:   "," throws_list
               |;

Example:

class Client:
  def getServerVersion(self) -> string, raises ENotConnected, ETimeout:
      pass

The problem is that "raises" is a bit wordy.  I just don't like using
symbols without natural meanings in these situations... who wants to
be Perl?  *Maybe* a ! would be alright, since there is a very loose
connection:

class Client:
  def getServerVersion(self) -> string ! ENotConnected, ETimeout:
      pass

Or perhaps 2 bangs..., but I am really uncomfortable about the
readability of that syntax for people new to the language.

Whatever, I'm not too enamored by having exceptions as part of a
signature.  Part of the reason is variance of exceptions.  I've seen
cases where people wanted contravariance and it seemed natural.  I
don't think those situations are all that common, but I do also
believe that exceptions in signatures can lead to code that is
massively difficult to maintain as exceptions added to one method end
up needing to propogate all the way up a call chain through a program.
You end up with LONG exception lists.  I need to think more about this
topic, but right now I don't really care whether the programmer
explicitly lists exceptions on a per-function basis.  The type checker
can still determine what exceptions don't get caught in the scope of
what's been checked.  If there is a new build process for static
checking, it could report all uncaught exceptions at that time
(probably optionally), and it could dump them to a supplimental file
when doing incremental checking.  In short, I haven't thought about
this problem enough recently to say which way I prefer, but I lean
towards not making exceptions part of signatures.

Okay, this has been pretty long and rambling, and it's time for me to
stop writing.  I'm sorry if I didn't make total sense.  I have more to
say, but it'll have to wait until another day...

John

(*) Changing availability: The catcall rule also applies if a derived
type also chooses to make a feature unavailable to the outside world
(e.g., by removing the feature).  No one has been proposing a feature
of the system to do this sort of thing (yet... but visibility
mechanisms haven't been discussed much so far as I've seen).  Of
course, you can always muck around with the attribute directly...