From john@viega.org  Tue Mar 14 22:54:52 2000
From: john@viega.org (John Viega)
Date: Tue, 14 Mar 2000 17:54:52 -0500
Subject: [Types-sig] A late entry
Message-ID: <38CEC33C.6AC199A1@viega.org>

Whoops.  For me, this has always been an area of interest, but I
completely missed the fact that people were actually discussing stuff
here now, until the DC PIGgies meeting last night.  Here are my
comments so far on what I've seen on the type sig.  Unfortunately, I
haven't read much of what's been said, so I'm sorry if some things
have been discussed or are not appropriate.  I've tried to at least
read or skim the proposals on the list web page.

Here I'm going to mainly focus on Guido's proposal, since it seems to
be one of the more recent proposals, incorporating select ideas from
other proposals.

Let's start with some notes on terminology.  I've seen the word
"typesafe" thrown around quite a bit in a short period of time.  Let's
try to avoid using the word casually here, because it can lead to
confusion. I find people often don't know what they're refering to
when they use (or hear) the term.  Does it mean that no type errors
can ever happen at run time?  At all?  Or that they can only happen
under specific conditions?  If so, what are those conditions?
Cardelli would probably say that "safe" means that the error won't
cause a crash or go unnoticed at runtime (not everyone agrees with his
definitions).  By his definition, Python is already a type safe
language.  However, I often hear people who use the term to mean
static type safety... i.e., no type error will happen at run time.
There's essentially no practical hope for that in an object oriented
language like python.

I think it might be nice for us to agree to share a common
terminology.  I know there's a bit of dissention w/ Cardelli's
prefered terminology, but it at least provides us with a common
reference in a field where terminology tends to be very muddled.  I
would recommend we all read Section 1 of Cardelli's "Type Systems"
(it's about 9 pages). I have copies of this paper in ps and pdf:

http://www.list.org/~viega/cs655/TypeSystems.ps
http://www.list.org/~viega/cs655/TypeSystems.pdf

Oh, one thing that I should mention is that throughout this document I
assume an inline syntax for type checking.  I far prefer it.

Okay, next I want to talk about the way that "checked" and "unchecked"
modules interact in Guido's proposal.  He says that when an checked
module imports and uses an unchecked module, all the objects in the
unchecked module are assigned the type 'any' for the purposes of the
checking.  I am not 100% sure that's the best idea in the face of type
inference.  The checked module deserves type information that's as
specific as possible, so that better types can be infered for the
checked program.  Of course, in many situations a type inference
engine isn't going to be able to infer much more than "any" when there
are absolutely no type hints in the code.  But if something can be
done, why not?

The flip side of the coin is that after the type checking occurs, the
unchecked code can exhibit bad behavior, breaking infered invariants.
For example, let's say some checked code calls foo.bar(), which is
infered to have the type: integer*integer->integer.  Someone can
dynamically load a module from unchecked code which replaces foo.bar()
with a function of type string->integer.

Options here?  Run-time checks can be added at points where this is
possible. I don't know if I support runtime checks if they can be
avoided. Another option is to say that "guarantees" made by the type
checker only hold if the whole program is checked.  I'm comfortable
with that (though I am still probably slightly in favor of the
addition of runtime checks), but I suspect that others will not be.  I
have the same opinion when it comes to using checked modules from
unchecked modules.  Guido proposes adding runtime checks.  I might
prefer not to have them.  Of course, one option is to support both
options...

I also might disagree with always performing type checking at load
time (when an unchecked module imports a checked module).  I see new
type checking features as a static convenience for people who want to
use it.  Are people going to want to pay a runtime hit when linking
against checked code when they don't want to check code themselves?  I
don't think so.

Arguing against myself again, you can also make a case that this sort
of dynamic checking is necessary to be able to live up to the promises
a static checker makes when it "passes" a piece of code.  Guido talks
about dynamically importing a module and checking to see if the module
matches the expected signature.  That's certainly true... here's a
really simple case: let's assume that A imports B, and that A+B were
fully checked statically, but someone replaced B with C after checking
and before runtime.  Is it worth performing those extra checks?  The
static checks have done the best they could, and now, if there are
gross incompatabilities, we're hopefully going to see an error
dynamically whether we added extra checks or not (you can certainly
construct cases where that's not true).  I'd probably say it is worth
performing those static checks in some less frequent, highly dynamic
situations, such as when you've got a completely checked application,
then you import a module that was essentially typed to stdin.

I think the right answer for Python depends on what the goals are.
I'm looking for something that will help me find bugs statically that
traditionally only cropped up at run time, when possible.  Getting
contraversial, I don't know if I really care about supporting those
dynamic checks that are only there to support cases in which the
application is used in ways that subverts the assumptions I made when
performing the type checking, such as what modules I'd be using in my
application.  It should be possible to turn all that stuff off, at the
very least, and just let everything go while completely ignoring type
information.  Remember, dynamic features in the type system can end up
slowing down a language...

Okay, going to the GCD example...  Do we really need to use the "decl"
keyword alone to indicate that a module has been checked?  I'm fine
with gcd(3.5,3) throwing a dynamic error if the module is not checked
(I'd also be fine with it trying to run here, honestly).  But should
it try to compute the result if I remove the "decl" keyword from the
file, but still specify types of the arguments to gcd??

Why does the word "decl" indicate the module is checked?  Just because
the programmer added the keyword doesn't mean the types he or she
assigned are even internally consistant.  Is the interpreter going to
recheck on startup every single time?  It seems like a lot of overhead
to me.  Plus, if the keyword isn't there, but there's some type
information, shouldn't we check the types?

If the type checker is a separate static program, it seems to me that
things are more natural.  We try to check the inputs to the program.
If those inputs use other modules, we go and check them, too.  If the
checking fails (say it's unchecked), then we warn, and can either fail
or (optionally?) make assumptions (e.g., assign any everywhere) and
continue.

I'm worried a bit about a type checked library breaking code that
doesn't use type checking.  For example, let's say that Guido's GCD
example is in a std library right now, and type checking gets added.
What if code passes in floats?  The unchecked code causes a runtime
exception in Guido's proposal.  Well, the code might not always be
wrong!  For example, let's go back to GCD.  While GCD would get typed
as integer*integer->integer, you can get perfectly valid results if
you pass in floats if your floats always happen to be integers.  Maybe
my code always calls GCD as so:

gcd(12.0,28.0)

Why should this code break?  Thats one more reason why I think that
checking maybe should only be performed "on demand" (dynamic checks
are okay if they are explicitly requested).

Next, let's talk briefly about what type inferencing can do.  In
particular, consider Guido's example where he assumes that the type
inference algorithm isn't sophisticated enough to handle complex
types, such as lists.  Let me say that lists are pretty easy to handle
when doing type inferencing.  Object types can get a tad bit tricky,
but everything should be doable.  When I have time, I'll ramble a bit
about what I see as the best approach to implementing a type inference
engine for Python based on well-known algorithms.

There's a problem though.  No one has ever been successful at coming
up with an OO language that successfully integrates all 3 of the
following:

1) Type inferencing
2) Subtyping
3) principal types

Currently, you can choose any two (See Jens Palsburg's brief note
"Type Inference For Objects", which I have at
http://www.list.org/~viega/cs655/ObjInference.pdf.  BTW, a principal
type summarizes all possible types of a given piece of code (variable,
function, etc).  I'd like to capture all 3.  Basically, 'a->'a is the
principal type of the "identity" function, even though there are
plenty of valid specific types that will work in a particular context.

I don't know if we can solve this problem in the general case. I don't
think it's possible for Python's needs, but I haven't really given it
too much thought yet.

A couple of brief thoughts at this point... What should the type
checker do with infered types?  Place them directly into the code?
Place them in comments or a doc string so that they have no effect but
you can see what the checker did and double-check its work?

Also, it might be nice to type code when you don't have access to the
source.  Should it be possible to specify the types of features in a
module for which you don't have source (assume it wasn't written with
checking code)?  If so, how?  An interface definition file?  Then do
you try to check the byte code to make sure the given interfaces are
actually valid?

Here's an issue I haven't seen brought up yet. Consider the following
function:

def identity(x):
  return x

What's the type of this function?  There are a couple ways to look at
this problem.  First, we can look at all contexts in which "identity"
is called, and have its type be the union of all of those types.  That
is a very ad hoc method of genericity.  If we call "identity" as such:

identity('foo')

And there are no other instances of this call, then we would assign
this function the type string->string.  If there's also a call:

identity(12)

Then would the type then become (string|integer)->(string|integer)?
That sounds like a bad idea.  At that point, the type system has lost
precision (I have a big problem with OR-ing of types, which has been
proposed... more below).  The signature implies you can pass in a
string and get back an integer, which means the type is too liberal.
Another reason this isn't a great idea is that you'd have to defer the
type until you've seen all calls.  Plus, as you add more code, and
call "identity" in different contexts, the type will grow.  That seems
unweildy, especially in documentation.

I think it would be best to be able to say, "the in type is the out
type, and who cares about the type beyond that".  Parametric
polymorphism to the rescue... the type could be <a>-><a>, where <a>
indicates that x can be of any type.  I think the <a> syntax could be
better (I prefer ML's, which types identity as 'a->'a, but I hear
there's some resistance to it in this community... I'm going to use 'a
from now on because it looks more natural to me).  One problem here is
that it probably isn't going to be possible to infer an accurate
principal polymorphic type for things in all cases (see above).  I'll
have to give the worst cases some thought.

You may have noticed that 'a looks a heck of a lot like the "any"
keyword proposed by others.  It turns out to be pretty much the same
thing, except parametrically polymorphic. One thing you can do is
differentiate between different types (basically buying you
parameterized functions, but with a much better syntax, IMHO.  Eg:

def init_dict(key : 'a, val : 'b):
  global dict : {'a:'b}
  dict[key] = val

The proposed alternative is something like:

def init_dict<a,b>(key: a, val : b):
  ...

There seems to be an implication that functions named init_dict will
need to be instantiated...  That's a kludgy C++-ism... instantiation
is not necessary for parametric polymorphism.

So should the proposed "any" keyword go away?  Unfortunately, no.  The
difference in semantics between "any" and generic types is that the
"any" keyword basically forces the program to forego type checks when
variables involving "any" are involved, which will sometimes be
necessary.

An example might clear up the distinction:

def f(x): # Here, x is infered to be of type 'a, which is currently
          # the principal type SO FAR.
  x.foo() # Whoops, we just had to narrow 'a to any object with a "foo"
          # method.  If x were of type "any", its type would stay the
same.


I really hope that people avoid using "any" *always*, but it does need
to be there, IMHO.

Another problem that falls out at this point is how to write the type
of x when we've seen x call "foo".  Do we look at all classes across
the application for ones with a foo method?  And do we care if that
solutions precluds classes that dynamically add their "foo" method
from applying?  Ick.  I'd prefer to avoid the concrete, and infer
something like:

<< foo: None->'a >>

Which would read: an object with a field foo of type function taking
no arguments and returning anything.

In the following case:

def f(x):
  decl i : integer
  x.foo()
  i = i + x.bar("xxx")
  z = x.blah(1,2)

We would infer the following type:

<<
   foo:  None->'a,
   bar:  string->integer,
   blah: integer*integer->'b
>>

Adding contrained polymorphic types in declarations should be
possible, even though it'd get messy without some sort of typedef
statement:

def add_observer(x : << notify: string->None >> ) -> None:
    global notify_list: << notify : string->None >>
    if not notify_list: notify_list = [x]
    else: notify_list.append(x)

It'd be nice to be able to do:

typedef notifyable << notify: string->None >>
def add_observer(x : notifyable) -> None:
    global notify_list: notifyable
    if not notify_list: notify_list = [x]
    else: notify_list.append(x)

Ok, back to the OR-ing of types.  The result of such a construct is
only going to be lots of ad hoc typecase stuff and types that are not
as precise as they should be (e.g., the
(string|integer)->(string|integer) example above).  For example,
consider the following code:

class a:
  def foo(self : a, x : integer): ...
class b:
  def bar(self : b): ...

def blah(x : a | b):
  x.foo(2)

That code shouldn't pass through the type checker, because there's no
guarantee that x has a method foo.  The only real solution every time
you have an OR type is a typecase statement, which is ad hoc and can
lead to maintainance problems.  I really don't think there should be a
language construct to support what's really bad programming
practice... I think that "any" should be the only place in the system
where types can be that ambiguous.  (To clarify... typecases are
sometimes necessary, but I think the OR-ing of types is a pretty bad
idea).

I do, however, support the AND-ing of types.  There's still a minor
issue here.  Consider the code:

class a:
  def foo(self: a, x: integer) -> integer: ...
class b:
  def bar(self: b) -> None: ...

def blah(x:a&b):
  x.foo()

This should definitely type check.  However, what are the requirements
we should impose on the variable x?  Must it inherit both a and b?  Or
need it only implement the same methods that a and b statically
define?  I prefer the former.  There's also the option of restricting
&'s to interface types only, which I think is fine.  BTW, if there
isn't an interface mechanism in the 1.0 version of the type system,
people will start defining "interfaces" as such:

class IWidget:
  def draw(self: IWidget) -> None: pass
  def getboundingbox(self: IWidget) -> (float*float)*(float*float): pass

That's okay, but you do want the type checker to be able to
distinguish between an abstract method and a concrete method.
Otherwise:

class Scrollbar(IWidget):
  pass

Would automatically be a correct implementation of the IWidget
interface, even though we failed to define the methods listed in that
interface (we should be forced to add them explicitly, even if their
body is just a "pass").  I'd much rather the above give an error.  I
think that special casing classes with no concrete implementations
isn't that good an idea, so an "interface" keyword should be
considered, which would look the same as classes without the method
bodies, and with the restriction that interfaces cannot inherit
classes (though it's desirable for classes to inherit interfaces,
obviously).  I wonder if it would be good to also pull out the
explicit self parameter.


Eg:

interface IWidget:
  def draw() -> None
  def getboundingbox() -> (float*float)*(float*float)

I think it would be good to allow parameter-based method overloading
for people who use the type system.  You'd be allowed to do stuff like:

class Formatter:
  def print(self: Formatter, i : integer)->None: ...
  def print(self: Formatter, s : string)->None:  ...
  def print(self: Formatter, l : ['a])->None: ...

It would be easy for the compiler to turn the above into something like:

class Formatter:
  def $print_integer(self, i)->None: ...
  def $print_string(self, s)->None: ...
  def $print_list_of_generic(self, l)->None: ...
  def print(self, x)->None:
    typecase x:
      case i: integer => self.$print_integer(i)
      case s: string  => self.$print_string(s)
      case l: ['a]    => self.$print_list_of_generic(l)
# The following should be implicit, but I'll list it explicitly...
      default    => raise TypeError

The $'s above are just some magic to prevent collisions... however
this is actually implemented is also not very important.  One
difficulty here is making sure the debugger handles things properly
(i.e., maps stuff back to the original code properly)...

The syntax of typecasing is not too important here. It could be the
casting syntax proposed elsewhere.  I prefer a real typecase statement
based on matching as above.  It's more powerful, and easier to read (I
don't like choosing arbitrary operators).  The problem is what exact
syntax to use so that you can show types, bind to variables and allow
for code blocks, while keeping the syntax as consistant with the rest
of the language and type system as possible.  The colon always
precedes a code block, but we're using the colon to separate a
variable from its type, too.  However, don't like:
case l : ['a] : self.$print_list_of_generic(l)
Perhaps:
case ['a] l : self.$print_list_of_generic(l)

Though that suddenly makes type decls very irregular.  Then there's
the option of not explicitly assigning to a new variable:

case ['a] : self.$print_list_of_generic(x)  # note the x

I think that last one gets my vote, currently.  We can definitely
figure out the type of x within that block.  If it needs to be used
outside the block, then people can copy it into a variable if they
want to preserve the cast:

decl l : ['a]
typecase x:
  case ['a] : l = x

By the way, note that the two "'a"'s in the above code can be
different and still work:

decl l : ['whatever]
typecase x:
  case ['a] : l = x  # This is okay... the types are compatible,
                     # but now there is an implicit equivolence of
                     # the 2 types.

To implement the same kind of matching better functional languages
provide, we'd need something that allowed for assigning to multiple
vars at once:

decl z : integer
typecase a:
  case (x : integer, y : integer) => z = x + y
  case (x : integer,)             => z = x
  case x : integer                => z = x

I'd like this type of matching, but it's got the too many colon syntax
problem, since the => is not pythonesque...

Hmm, what if all types were expressed using :- instead of :?  Yes, :-
is the assignment operator in one or two languages, but not too many
people have used those languages:

decl z :- integer
typecase a:
  case (x :- integer, y :- integer) : z = x + y
  case (x :- integer,)              : z = x
  case x :- integer                 : z = x

That's not quite as bad.


Now, on to variance of arguments.  Contravariance is definitely bad
IMHO.  Yes, it's a simpler model, and lends itself to type safety
better, but if you've ever programmed in Sather, you probably know
that it can get really inconvenient to do really simple
things. Invariance (aka nonvariance or novariance) is the approach
taken by C++ and Java.  It usually does pretty well, and is simple to
implement.  It usually does what you want, and doesn't lead to the
same runtime type problems covariance does.  However, consider a
situation like the following:

class Container: pass
class LinkedList(Container):
   def add(self, l :- LinkedList):
     ...

class BidirectionalLinkedList(LinkedList):
   def add(self, l :- ???):
     ...

(BTW, I'm going to drop the type of self from here on out, and assume
that it is always the type of the class in which the method is
defined.  I don't think there should be a type variable to allow for
talking about the type of self, such as in the __add__ example in
Guido's proposal.  Let covariance of parameters do the work... type
variables can lead to some pretty subtle problems.)

In the above example, what should the type of l be?  In a
contravariant language, as the class gets more specific, the
parameters must get more generic or stay the same.  Therefore,
implementing BidirectionalLinkedList requires an explicit type cast if
we want to enforce the natural restriction that you can only add
another bidirectional linked list on to the end of a bidirectional
linked list...

In a covariant language, "BidirectionalLinkedList" would be the right
answer, and that seems natual.  Choosing to keep your parameter
invariant is actually fine as well, so you can do:

class LinkedList:
  def merge(self, l :- LinkedList):
    ...

class BidirectionalLinkedList(LinkedList):
  def merge(self, l :- LinkedList):
    ...

(We don't care whether the parameter is bidirectional or not)
I'll get back to the problems with this approach in a second.

Invariance is an answer... the parameter would forever have to be
LinkedList in all subclasses, but really requires a typecase on the
parameter, or the clever use of a constrained generic type:

class LinkedList<T -> LinkedList>: # specifies that the parameter must
be
                                   # substitutable with LinkedList.
   def add(self, l :- T):
     ...
   def merge(self, l : LinkedList):
     ...

class BidirectionalLinkedList(LinkedList<BidirectionalLinkedList>):
   def add(self, l):
     ... # l is of type BidirectionalLinkedList here.
   def merge(self, l : LinkedList):
     ...

I think constrained generic types are good(tm) and should be added.
This was the approach I was leaning towards last night at the DC
PIGgies meeting, but I've changed my mind, as, there are some
reasonable problems with using them here.  If we have a lot of
parameters of different types that each need to vary, we have to use a
ton of constrained types.  It can get ugly quickly.  Plus, if we
wanted to add a feature to a base class that required a covariant
parameter, we'd have to add a new constrained parameter, which change
the interface of the class, potentially breaking a lot of code.  I
think this is bad, and perhaps worse than typecasting, which is
another solution when you only have invariance.

So what's wrong with covariance?  Here's a common example of the
problem:

class Person:
  def ShareRoom(self, p :- Person)->None:
    self.room = p.room

class Boy(Person):
  pass

class Girl(Person):
  pass

Hmm, the above isn't quite what we want, as the intent is probably to
keep boys from rooming with girls.  Let's assume we want to allow
invariant parameters.  We'd have to recode our Boy and Girl classes as
such:

class Boy(Person):
  def ShareRoom(self, p :- Boy)->None:
    Person.ShareRoom(self, p)

class Girl(Person):
  def ShareRoom(self, p :- Girl)->None:
    Person.ShareRoom(self, p)

The problem here is that we can now do the following:

decl p :- Person
decl g :- Girl
p = Boy()
g = Girl()
g.room = blah
p.ShareRoom(g)

The last call there appears type correct... a static type checker will
say "looks good!" because Person's ShareRoom accepts any person
object.  At run time, we will dispatch to the Boy's version of
ShareRoom, which narrows the type, and yields a type error (if we are
adding dynamic checks).

One solution is adding anchored types to the language, which is okay,
but the programmer could still write the above code, and it would
still be broken.  Anchored types essentially just gives the programmer
a way to perform the above and get an error on the p.ShareRoom(g) call
statically, but only if he added additional magic to one of his
classes.  Anchored types are cool, but I don't think this is the right
solution for the problem, so I won't cover them right now.

Meyer's prefered approach is to add a rule that says "polymorphic
catcalls are invalid".  What's a catcall?  CAT stands for "Changing
Availability or Type".  In the context of this discussion, a routine
that is a cat is a routine where the type of the arguments of a
function vary in a derived class(*).  A catcall is any call to a cat
method.  A polymorphic catcall is a call to a cat method where the
target object is polymorphic, which happens in at least 2 cases:

  1) The object appears in the LHS of an assignment, where the RHS is
     a subtype of the LHS.

  2) The object is a formal parameter to a method.

There might be a third case... I'll have to go look it up.  I don't
think there is any problem that would make the solution not
appropriate for Python, though.

It should be possible to implement this solution, and even do so
incrementally.  There's another (better, less pessimistic) solution
called the global validity approach.  The problem with it, IIRC, is
that the algorithm basically assumes that type checking goes on in a
"closed world" environment, where you're checking the entire system at
once.  That probably isn't desirable.  I wonder if there haven't been
refinements to this algorithm that allow it to work incrementally.
Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
rule, assuming that the catcall rule can actually work for Python.

BTW, return types should be covariant too.

Syntax for exceptions: If we add exceptions as part of a signature, I
don't think that I see a good reason to use anything other than
something similar to Java's "throws" syntax.  I'd add a comma before
the "throws" for clarity's sake. Here's a right-recursive (LL) grammar
fragment:

throws_clause: "," "raises" throws_list;
throws_list:   identifier more_throws;
more_throws:   "," throws_list
               |;

Example:

class Client:
  def getServerVersion(self) -> string, raises ENotConnected, ETimeout:
      pass

The problem is that "raises" is a bit wordy.  I just don't like using
symbols without natural meanings in these situations... who wants to
be Perl?  *Maybe* a ! would be alright, since there is a very loose
connection:

class Client:
  def getServerVersion(self) -> string ! ENotConnected, ETimeout:
      pass

Or perhaps 2 bangs..., but I am really uncomfortable about the
readability of that syntax for people new to the language.

Whatever, I'm not too enamored by having exceptions as part of a
signature.  Part of the reason is variance of exceptions.  I've seen
cases where people wanted contravariance and it seemed natural.  I
don't think those situations are all that common, but I do also
believe that exceptions in signatures can lead to code that is
massively difficult to maintain as exceptions added to one method end
up needing to propogate all the way up a call chain through a program.
You end up with LONG exception lists.  I need to think more about this
topic, but right now I don't really care whether the programmer
explicitly lists exceptions on a per-function basis.  The type checker
can still determine what exceptions don't get caught in the scope of
what's been checked.  If there is a new build process for static
checking, it could report all uncaught exceptions at that time
(probably optionally), and it could dump them to a supplimental file
when doing incremental checking.  In short, I haven't thought about
this problem enough recently to say which way I prefer, but I lean
towards not making exceptions part of signatures.

Okay, this has been pretty long and rambling, and it's time for me to
stop writing.  I'm sorry if I didn't make total sense.  I have more to
say, but it'll have to wait until another day...

John

(*) Changing availability: The catcall rule also applies if a derived
type also chooses to make a feature unavailable to the outside world
(e.g., by removing the feature).  No one has been proposing a feature
of the system to do this sort of thing (yet... but visibility
mechanisms haven't been discussed much so far as I've seen).  Of
course, you can always muck around with the attribute directly...


From faassen@vet.uu.nl  Wed Mar 15 18:21:28 2000
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Wed, 15 Mar 2000 19:21:28 +0100
Subject: [Types-sig] An ignorable wild idea
Message-ID: <20000315192128.A28183@vet.uu.nl>

Hi there,

I had another wild idea the other day that's probably fairly silly, so
of course I instantly thought I should share it with you all. If you
don't feel like having the discussion move out into a strange direction
simply don't reply or don't read this, though of course I wouldn't
post this if I didn't want feedback.

What are dynamic types in Python? They're basically attributes of Python
objects.

What would static types be? They're attributes of variables.

We change attributes of objects like this:

a.attribute = b

i.e., with the '.' notation and assignment. It's sometimes even possible
to change the type of an object (or at least the class, or the methods and
data a particular object has).

Now, the idea was introduce variable attributes. I'll use the operator
-> for that, but any operator would do.

a->attribute = b

This would change the attribute of the variable 'a' to whatever's in 'b'.

Since a type is an attribute of a variable, we'd set the type of a 
particular variable like this:

a->type = Integer

By default, all variables have the attribute 'type' set to 'Any'.

Variable attribute access is all analyzed and evaluated during compile
time, so one can't the result of some arbitrary Python expression into
a variable attribute at runtime. There needs to be a separate 
compile-time separate namespace for variable attribute assignments, with
some heavy restrictions. Accessing the variable attribute space during
runtime is no problem, though. This could work:

print a->type

and this too:

if a->type == Integer:
    # do whatever
    # though conditional type assignments should probably be disallowed:
    b->type = Integer

though it's doubtful how useful this would be.

Is this idea in fact useful at all, besides the nice parallel with object
attributes? I'm not sure. This might come in handy if you're doing
generic functions, perhaps:

a->type = foo->type # the variable a will contain the same type as foo.

And this variable attribute facility may have more uses. Perhaps it
could support docstrings:

a->doc = "Holds temporary value"

And it could be used for introspection, if any variable inside scope
is accessible through the -> operator:

def foo(a):
    b->doc = "holds a number"
    b = 5
    return b

print foo->a->type
print foo->b->doc

class Bar:
    def __init__(self):
        pass

    method->type = String
    def method(self):
        b->type = Integer
        b = 15
        return str(b)

print Bar->method->b->type

bar->type = Bar
bar = Bar()

if hasmethod(bar->type, 'method'):
    bar->method()

Where 'hasmethod' could be evaluated at compile-time, so we'd get this:

if 1:
   bar->method()

but this if we removed 'method':

if 0:
   bar->method()

Of course you'd need many restrictions about what can be done with
a class and objects at run time, but you need those in any case if you
do static type checking.

Anyway, these are mostly idle speculations, based on the idea that variables
themselves have attributes. Just wanted to let you all know, if this is news
at all. Many questions are left unanswered. :)

Regards,

Martijn
 

From John@list.org  Wed Mar 15 21:00:24 2000
From: John@list.org (John Viega)
Date: Wed, 15 Mar 2000 13:00:24 -0800
Subject: [Types-sig] An ignorable wild idea
In-Reply-To: <20000315192128.A28183@vet.uu.nl>; from Martijn Faassen on Wed, Mar 15, 2000 at 07:21:28PM +0100
References: <20000315192128.A28183@vet.uu.nl>
Message-ID: <20000315130024.A30986@viega.org>

Martijn,

General, off the cuff comments:

I think it isn't a bad idea to add attributes to variables, but I see
that as completely seperate from type systems.  For the most part, I
can interpret your proposal as an alternate syntax for a proposed type
system.  I personally would prefer the type system to look more
different than similar when compared to attributes.

Beyond the syntax issues:

You talk about disallowing conditional assignments to type variables:

if some_condition:
  a->type = Integer
else:
  a->type = Float

Yes, this would be bad for static checking.  When statically checking
this, the only thing I can infer about a's type is Integer|Float.
I've argued against OR-ing types previously.  However, people aren't
going to understand the restriction.  Everything about the syntax and
the feature seems to point to "dynamic".

Plus, types can have lifetimes now, where a is definitely only an
Integer for part of the program, and a Float for part of the program.
It becomes difficult to reason about things statically when that's
possible:

a->type = Integer
... operations on a...
a->type = Float
... more operations on a...

It's not impossible to deal with, but an unnecessary headache.

Then, how to specify parameter types, etc?  The syntax wouldn't
provide an elegant solution.

I don't think it's the right approach *for types*, but I do think
there may be some utility in having slots for variables in other areas
for meta-information including debug information.

BTW, one minor nit with what you said up front: every value in the language
has a type, not just variables.

John


On Wed, Mar 15, 2000 at 07:21:28PM +0100, Martijn Faassen wrote:

> What are dynamic types in Python? They're basically attributes of Python
> objects.
> 
> What would static types be? They're attributes of variables.
> 
> We change attributes of objects like this:
> 
> a.attribute = b
> 
> i.e., with the '.' notation and assignment. It's sometimes even possible
> to change the type of an object (or at least the class, or the methods and
> data a particular object has).
> 
> Now, the idea was introduce variable attributes. I'll use the operator
> -> for that, but any operator would do.
> 
> a->attribute = b
> 
> This would change the attribute of the variable 'a' to whatever's in 'b'.
> 
> Since a type is an attribute of a variable, we'd set the type of a 
> particular variable like this:
> 
> a->type = Integer
> 
> By default, all variables have the attribute 'type' set to 'Any'.
> 
> Variable attribute access is all analyzed and evaluated during compile
> time, so one can't the result of some arbitrary Python expression into
> a variable attribute at runtime. There needs to be a separate 
> compile-time separate namespace for variable attribute assignments, with
> some heavy restrictions. Accessing the variable attribute space during
> runtime is no problem, though. This could work:
> 
> print a->type
> 
> and this too:
> 
> if a->type == Integer:
>     # do whatever
>     # though conditional type assignments should probably be disallowed:
>     b->type = Integer
> 
> though it's doubtful how useful this would be.
> 
> Is this idea in fact useful at all, besides the nice parallel with object
> attributes? I'm not sure. This might come in handy if you're doing
> generic functions, perhaps:
> 
> a->type = foo->type # the variable a will contain the same type as foo.
> 
> And this variable attribute facility may have more uses. Perhaps it
> could support docstrings:
> 
> a->doc = "Holds temporary value"
> 
> And it could be used for introspection, if any variable inside scope
> is accessible through the -> operator:
> 
> def foo(a):
>     b->doc = "holds a number"
>     b = 5
>     return b
> 
> print foo->a->type
> print foo->b->doc
> 
> class Bar:
>     def __init__(self):
>         pass
> 
>     method->type = String
>     def method(self):
>         b->type = Integer
>         b = 15
>         return str(b)
> 
> print Bar->method->b->type
> 
> bar->type = Bar
> bar = Bar()
> 
> if hasmethod(bar->type, 'method'):
>     bar->method()
> 
> Where 'hasmethod' could be evaluated at compile-time, so we'd get this:
> 
> if 1:
>    bar->method()
> 
> but this if we removed 'method':
> 
> if 0:
>    bar->method()
> 
> Of course you'd need many restrictions about what can be done with
> a class and objects at run time, but you need those in any case if you
> do static type checking.
> 
> Anyway, these are mostly idle speculations, based on the idea that variables
> themselves have attributes. Just wanted to let you all know, if this is news
> at all. Many questions are left unanswered. :)
> 
> Regards,
> 
> Martijn
>  
> 
> _______________________________________________
> Types-SIG mailing list
> Types-SIG@python.org
> http://www.python.org/mailman/listinfo/types-sig


From scott@chronis.pobox.com  Thu Mar 16 02:26:28 2000
From: scott@chronis.pobox.com (scott)
Date: Wed, 15 Mar 2000 21:26:28 -0500
Subject: [Types-sig] A late entry
In-Reply-To: <38CEC33C.6AC199A1@viega.org>; from john@viega.org on Tue, Mar 14, 2000 at 05:54:52PM -0500
References: <38CEC33C.6AC199A1@viega.org>
Message-ID: <20000315212628.A99258@chronis.pobox.com>

On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote:
> Whoops.  For me, this has always been an area of interest, but I
> completely missed the fact that people were actually discussing stuff
> here now, until the DC PIGgies meeting last night.  Here are my
> comments so far on what I've seen on the type sig.  Unfortunately, I
> haven't read much of what's been said, so I'm sorry if some things
> have been discussed or are not appropriate.  I've tried to at least
> read or skim the proposals on the list web page.

It's great to see a post from you here.  I know you've studied this
stuff and can offer valuable insights.

> 
> Here I'm going to mainly focus on Guido's proposal, since it seems to
> be one of the more recent proposals, incorporating select ideas from
> other proposals.
> 
> Let's start with some notes on terminology.  I've seen the word
> "typesafe" thrown around quite a bit in a short period of time.  Let's
> try to avoid using the word casually here, because it can lead to
> confusion. I find people often don't know what they're refering to
> when they use (or hear) the term.  Does it mean that no type errors
> can ever happen at run time?  At all?  Or that they can only happen
> under specific conditions?  If so, what are those conditions?
> Cardelli would probably say that "safe" means that the error won't
> cause a crash or go unnoticed at runtime (not everyone agrees with his
> definitions).  By his definition, Python is already a type safe
> language.  However, I often hear people who use the term to mean
> static type safety... i.e., no type error will happen at run time.
> There's essentially no practical hope for that in an object oriented
> language like python.
> 
> I think it might be nice for us to agree to share a common
> terminology.  I know there's a bit of dissention w/ Cardelli's
> prefered terminology, but it at least provides us with a common
> reference in a field where terminology tends to be very muddled.  I
> would recommend we all read Section 1 of Cardelli's "Type Systems"
> (it's about 9 pages). I have copies of this paper in ps and pdf:
> 
> http://www.list.org/~viega/cs655/TypeSystems.ps
> http://www.list.org/~viega/cs655/TypeSystems.pdf

neat paper.  any other references to throw at us?
> 
> Oh, one thing that I should mention is that throughout this document I
> assume an inline syntax for type checking.  I far prefer it.
> 
> Okay, next I want to talk about the way that "checked" and "unchecked"
> modules interact in Guido's proposal.  He says that when an checked
> module imports and uses an unchecked module, all the objects in the
> unchecked module are assigned the type 'any' for the purposes of the
> checking.  I am not 100% sure that's the best idea in the face of type
> inference.  The checked module deserves type information that's as
> specific as possible, so that better types can be infered for the
> checked program.  Of course, in many situations a type inference
> engine isn't going to be able to infer much more than "any" when there
> are absolutely no type hints in the code.  But if something can be
> done, why not?
> 
> The flip side of the coin is that after the type checking occurs, the
> unchecked code can exhibit bad behavior, breaking infered invariants.
> For example, let's say some checked code calls foo.bar(), which is
> infered to have the type: integer*integer->integer.  Someone can
> dynamically load a module from unchecked code which replaces foo.bar()
> with a function of type string->integer.
> 
> Options here?  Run-time checks can be added at points where this is
> possible. I don't know if I support runtime checks if they can be
> avoided. Another option is to say that "guarantees" made by the type
> checker only hold if the whole program is checked.  I'm comfortable
> with that (though I am still probably slightly in favor of the
> addition of runtime checks), but I suspect that others will not be.  I
> have the same opinion when it comes to using checked modules from
> unchecked modules.  Guido proposes adding runtime checks.  I might
> prefer not to have them.  Of course, one option is to support both
> options...

When we talk about adding runtime checks, it's a little unclear to me
exactly what is meant.  Are you referring to additional runtime checks
that the existing dynamic type system in python does not provide?  Is
leveraging the existing dynamic type system in this way feasible for
these sorts of checks?  If this is possible, it is one approach I'd
prefer -- there's no performance hit, and nothing that isn't checked.
The only drawback I can see to using the existing system as a fallback
is that it would limit the degree of optimization that is available,
but I believe that's OK.

> 
> I also might disagree with always performing type checking at load
> time (when an unchecked module imports a checked module).  I see new
> type checking features as a static convenience for people who want to
> use it.  Are people going to want to pay a runtime hit when linking
> against checked code when they don't want to check code themselves?  I
> don't think so.
> 
> Arguing against myself again, you can also make a case that this sort
> of dynamic checking is necessary to be able to live up to the promises
> a static checker makes when it "passes" a piece of code.  Guido talks
> about dynamically importing a module and checking to see if the module
> matches the expected signature.  That's certainly true... here's a
> really simple case: let's assume that A imports B, and that A+B were
> fully checked statically, but someone replaced B with C after checking
> and before runtime.  Is it worth performing those extra checks?  The
> static checks have done the best they could, and now, if there are
> gross incompatabilities, we're hopefully going to see an error
> dynamically whether we added extra checks or not (you can certainly
> construct cases where that's not true).  I'd probably say it is worth
> performing those static checks in some less frequent, highly dynamic
> situations, such as when you've got a completely checked application,
> then you import a module that was essentially typed to stdin.
> 
> I think the right answer for Python depends on what the goals are.
> I'm looking for something that will help me find bugs statically that
> traditionally only cropped up at run time, when possible.  Getting
> contraversial, I don't know if I really care about supporting those
> dynamic checks that are only there to support cases in which the
> application is used in ways that subverts the assumptions I made when
> performing the type checking, such as what modules I'd be using in my
> application.  It should be possible to turn all that stuff off, at the
> very least, and just let everything go while completely ignoring type
> information.  Remember, dynamic features in the type system can end up
> slowing down a language...

I agree very much with all these points about checking modules
dynamically.

> 
> Okay, going to the GCD example...  Do we really need to use the "decl"
> keyword alone to indicate that a module has been checked?  I'm fine
> with gcd(3.5,3) throwing a dynamic error if the module is not checked
> (I'd also be fine with it trying to run here, honestly).  But should
> it try to compute the result if I remove the "decl" keyword from the
> file, but still specify types of the arguments to gcd??
> 
> Why does the word "decl" indicate the module is checked?  Just because
> the programmer added the keyword doesn't mean the types he or she
> assigned are even internally consistant.  Is the interpreter going to
> recheck on startup every single time?  It seems like a lot of overhead
> to me.  Plus, if the keyword isn't there, but there's some type
> information, shouldn't we check the types?
> 
> If the type checker is a separate static program, it seems to me that
> things are more natural.  We try to check the inputs to the program.
> If those inputs use other modules, we go and check them, too.  If the
> checking fails (say it's unchecked), then we warn, and can either fail
> or (optionally?) make assumptions (e.g., assign any everywhere) and
> continue.
> 

[ gcd working w/ floats ]

> 
> Next, let's talk briefly about what type inferencing can do.  In
> particular, consider Guido's example where he assumes that the type
> inference algorithm isn't sophisticated enough to handle complex
> types, such as lists.  Let me say that lists are pretty easy to handle
> when doing type inferencing.  Object types can get a tad bit tricky,
> but everything should be doable.  When I have time, I'll ramble a bit
> about what I see as the best approach to implementing a type inference
> engine for Python based on well-known algorithms.

looking forward to the ramblings!

> 
> There's a problem though.  No one has ever been successful at coming
> up with an OO language that successfully integrates all 3 of the
> following:
> 
> 1) Type inferencing
> 2) Subtyping
> 3) principal types
> 
> Currently, you can choose any two (See Jens Palsburg's brief note
> "Type Inference For Objects", which I have at
> http://www.list.org/~viega/cs655/ObjInference.pdf.  BTW, a principal
> type summarizes all possible types of a given piece of code (variable,
> function, etc).  I'd like to capture all 3.  Basically, 'a->'a is the
> principal type of the "identity" function, even though there are
> plenty of valid specific types that will work in a particular context.
> 
> I don't know if we can solve this problem in the general case. I don't
> think it's possible for Python's needs, but I haven't really given it
> too much thought yet.

It seems like you are refering to inferencing as a mechanism which can
both allow the user to denote fewer types and as a means of dealing
with the mixing of unchecked code with checked code.  With regards to
meating the first goal, a very limited kind of inferencing is
possible, where the first assignment to a variable from an expression
of a given type has the same affect as declaring the variable as that
type in the first place.  I think that the former goal of reducing the
number of declarations the programmer must make is attainable with a
mechanism like this, but the latter goal would require a real
inferencing algo.

> 
> A couple of brief thoughts at this point... What should the type
> checker do with infered types?  Place them directly into the code?
> Place them in comments or a doc string so that they have no effect but
> you can see what the checker did and double-check its work?

maybe just print out inferred types if requested and the type checker
is a separate program?

> 
> Also, it might be nice to type code when you don't have access to the
> source.  Should it be possible to specify the types of features in a
> module for which you don't have source (assume it wasn't written with
> checking code)?  If so, how?  An interface definition file?  Then do
> you try to check the byte code to make sure the given interfaces are
> actually valid?
> 
> Here's an issue I haven't seen brought up yet. Consider the following
> function:


[ identity function signature ]

> I think it would be best to be able to say, "the in type is the out
> type, and who cares about the type beyond that".  Parametric
> polymorphism to the rescue... the type could be <a>-><a>, where <a>
> indicates that x can be of any type.  I think the <a> syntax could be
> better (I prefer ML's, which types identity as 'a->'a, but I hear
> there's some resistance to it in this community... I'm going to use 'a
> from now on because it looks more natural to me).  One problem here is
> that it probably isn't going to be possible to infer an accurate
> principal polymorphic type for things in all cases (see above).  I'll
> have to give the worst cases some thought.

The stab I have taken at writing a checker allows this kind of
polymorphism for functions.  It uses ~ instead of '.  Using a prefix
character is something I prefer as well, though I think there are much
more important issues to ponder, so I get on with it :)

> 
> You may have noticed that 'a looks a heck of a lot like the "any"
> keyword proposed by others.  It turns out to be pretty much the same
> thing, except parametrically polymorphic. One thing you can do is
> differentiate between different types (basically buying you
> parameterized functions, but with a much better syntax, IMHO.  Eg:
> 
> def init_dict(key : 'a, val : 'b):
>   global dict : {'a:'b}
>   dict[key] = val
> 
> The proposed alternative is something like:
> 
> def init_dict<a,b>(key: a, val : b):
>   ...
> 
> There seems to be an implication that functions named init_dict will
> need to be instantiated...  That's a kludgy C++-ism... instantiation
> is not necessary for parametric polymorphism.

Correct.  the stab-at-type-checker I wrote does this, and it would be
handy.

> 
> So should the proposed "any" keyword go away?  Unfortunately, no.  The
> difference in semantics between "any" and generic types is that the
> "any" keyword basically forces the program to forego type checks when
> variables involving "any" are involved, which will sometimes be
> necessary.
> 
> An example might clear up the distinction:
> 
> def f(x): # Here, x is infered to be of type 'a, which is currently
>           # the principal type SO FAR.
>   x.foo() # Whoops, we just had to narrow 'a to any object with a "foo"
>           # method.  If x were of type "any", its type would stay the
> same.
> 
> 
> I really hope that people avoid using "any" *always*, but it does need
> to be there, IMHO.
> 
> Another problem that falls out at this point is how to write the type
> of x when we've seen x call "foo".  Do we look at all classes across
> the application for ones with a foo method?  And do we care if that
> solutions precluds classes that dynamically add their "foo" method
> from applying?  Ick.  I'd prefer to avoid the concrete, and infer
> something like:
> 
> << foo: None->'a >>
> 
> Which would read: an object with a field foo of type function taking
> no arguments and returning anything.

Do you think it's ok to require that the programmer declare something
about x in this case?  If something like what you suggest is inferred
there, it seems like a class of errors might slip through that we
might not want to allow to slip.  

> 
> In the following case:
> 
> def f(x):
>   decl i : integer
>   x.foo()
>   i = i + x.bar("xxx")
>   z = x.blah(1,2)
> 
> We would infer the following type:
> 
> <<
>    foo:  None->'a,
>    bar:  string->integer,
>    blah: integer*integer->'b
> >>
> 
> Adding contrained polymorphic types in declarations should be
> possible, even though it'd get messy without some sort of typedef
> statement:
> 
> def add_observer(x : << notify: string->None >> ) -> None:
>     global notify_list: << notify : string->None >>
>     if not notify_list: notify_list = [x]
>     else: notify_list.append(x)
> 
> It'd be nice to be able to do:
> 
> typedef notifyable << notify: string->None >>
> def add_observer(x : notifyable) -> None:
>     global notify_list: notifyable
>     if not notify_list: notify_list = [x]
>     else: notify_list.append(x)

contstrained polymorphism seems to solve lots of more complex static
type issues in clean ways.  Do you know of any references on
implementations for constrained polymorphism?  I'd like to look more
closely at how it's done elsewhere before taking a crack at it myself.

> 
> Ok, back to the OR-ing of types.  The result of such a construct is
> only going to be lots of ad hoc typecase stuff and types that are not
> as precise as they should be (e.g., the
> (string|integer)->(string|integer) example above).  For example,
> consider the following code:
> 
> class a:
>   def foo(self : a, x : integer): ...
> class b:
>   def bar(self : b): ...
> 
> def blah(x : a | b):
>   x.foo(2)
> 
> That code shouldn't pass through the type checker, because there's no
> guarantee that x has a method foo.  The only real solution every time
> you have an OR type is a typecase statement, which is ad hoc and can
> lead to maintainance problems.  I really don't think there should be a
> language construct to support what's really bad programming
> practice... I think that "any" should be the only place in the system
> where types can be that ambiguous.  (To clarify... typecases are
> sometimes necessary, but I think the OR-ing of types is a pretty bad
> idea).

I like the idea of discouraging OR'ing of types as much as possible.
There are two cases where I think it could come in very handy:  first,
if the static type of None is considered a distinct type rather
something like a null type, then there are lots of things that return
or produce type-of(None) OR something else.  Examples are dict.get and
default arguments.   After wrestling with this for a while, I've come
to think that the introduction of a null-type in a static type system
is more manageable for the programmer.  Do you see other ways of
dealing with that or how would you prefer that those things are
handled?  The other case is recursive data types such as trees, where
a node can contain either other nodes or leaves.

> 
> I do, however, support the AND-ing of types.  There's still a minor
> issue here.  Consider the code:
> 
> class a:
>   def foo(self: a, x: integer) -> integer: ...
> class b:
>   def bar(self: b) -> None: ...
> 
> def blah(x:a&b):
>   x.foo()
> 
> This should definitely type check.  However, what are the requirements
> we should impose on the variable x?  Must it inherit both a and b?  Or
> need it only implement the same methods that a and b statically
> define?  I prefer the former.  There's also the option of restricting
> &'s to interface types only, which I think is fine.  BTW, if there
> isn't an interface mechanism in the 1.0 version of the type system,
> people will start defining "interfaces" as such:
> 
> class IWidget:
>   def draw(self: IWidget) -> None: pass
>   def getboundingbox(self: IWidget) -> (float*float)*(float*float): pass
> 
> That's okay, but you do want the type checker to be able to
> distinguish between an abstract method and a concrete method.
> Otherwise:
> 
> class Scrollbar(IWidget):
>   pass
> 
> Would automatically be a correct implementation of the IWidget
> interface, even though we failed to define the methods listed in that
> interface (we should be forced to add them explicitly, even if their
> body is just a "pass").  I'd much rather the above give an error.  I
> think that special casing classes with no concrete implementations
> isn't that good an idea, so an "interface" keyword should be
> considered, which would look the same as classes without the method
> bodies, and with the restriction that interfaces cannot inherit
> classes (though it's desirable for classes to inherit interfaces,
> obviously).  I wonder if it would be good to also pull out the
> explicit self parameter.

I think it's a good idea to pull out the self parameter.  It makes
interfaces things that are more flexible and not constrained to class
methods being the only callable attributes.

> 
> 
> Eg:
> 
> interface IWidget:
>   def draw() -> None
>   def getboundingbox() -> (float*float)*(float*float)
> 
> I think it would be good to allow parameter-based method overloading
> for people who use the type system.  You'd be allowed to do stuff like:
> 
> class Formatter:
>   def print(self: Formatter, i : integer)->None: ...
>   def print(self: Formatter, s : string)->None:  ...
>   def print(self: Formatter, l : ['a])->None: ...
> 
> It would be easy for the compiler to turn the above into something like:
> 
> class Formatter:
>   def $print_integer(self, i)->None: ...
>   def $print_string(self, s)->None: ...
>   def $print_list_of_generic(self, l)->None: ...
>   def print(self, x)->None:
>     typecase x:
>       case i: integer => self.$print_integer(i)
>       case s: string  => self.$print_string(s)
>       case l: ['a]    => self.$print_list_of_generic(l)
> # The following should be implicit, but I'll list it explicitly...
>       default    => raise TypeError
> 
> The $'s above are just some magic to prevent collisions... however
> this is actually implemented is also not very important.  One
> difficulty here is making sure the debugger handles things properly
> (i.e., maps stuff back to the original code properly)...

Having multiple signatures for method overloading is a convenient way
to express the idea.  If such a thing were available to the type
checking engine, it would be fairly easy to use the underlying data
structures to map the overloading of the arithmetic operators, for
example.

> 
> The syntax of typecasing is not too important here. It could be the
> casting syntax proposed elsewhere.  I prefer a real typecase statement
> based on matching as above.  It's more powerful, and easier to read (I
> don't like choosing arbitrary operators).  The problem is what exact
> syntax to use so that you can show types, bind to variables and allow
> for code blocks, while keeping the syntax as consistant with the rest
> of the language and type system as possible.  The colon always
> precedes a code block, but we're using the colon to separate a
> variable from its type, too.  However, don't like:
> case l : ['a] : self.$print_list_of_generic(l)
> Perhaps:
> case ['a] l : self.$print_list_of_generic(l)
> 
> Though that suddenly makes type decls very irregular.  Then there's
> the option of not explicitly assigning to a new variable:
> 
> case ['a] : self.$print_list_of_generic(x)  # note the x
> 
> I think that last one gets my vote, currently.  We can definitely
> figure out the type of x within that block.  If it needs to be used
> outside the block, then people can copy it into a variable if they
> want to preserve the cast:
> 
> decl l : ['a]
> typecase x:
>   case ['a] : l = x
> 
> By the way, note that the two "'a"'s in the above code can be
> different and still work:
> 
> decl l : ['whatever]
> typecase x:
>   case ['a] : l = x  # This is okay... the types are compatible,
>                      # but now there is an implicit equivolence of
>                      # the 2 types.
> 
> To implement the same kind of matching better functional languages
> provide, we'd need something that allowed for assigning to multiple
> vars at once:
> 
> decl z : integer
> typecase a:
>   case (x : integer, y : integer) => z = x + y
>   case (x : integer,)             => z = x
>   case x : integer                => z = x
> 
> I'd like this type of matching, but it's got the too many colon syntax
> problem, since the => is not pythonesque...
> 
> Hmm, what if all types were expressed using :- instead of :?  Yes, :-
> is the assignment operator in one or two languages, but not too many
> people have used those languages:
> 
> decl z :- integer
> typecase a:
>   case (x :- integer, y :- integer) : z = x + y
>   case (x :- integer,)              : z = x
>   case x :- integer                 : z = x
> 
> That's not quite as bad.

With regards to type casing possibly causing the language to slow down
by bringing the static type information into runtime, do you think it
would be reasonable to allow typecasing only on types that are easily
expressible in terms of the existing dynamic type system?  It seems to
me that this approach would save a lot of work, limit the runtime
overhead, and discourage OR'ing all at once. 


> 
> 
> Now, on to variance of arguments.  Contravariance is definitely bad
> IMHO.  Yes, it's a simpler model, and lends itself to type safety
> better, but if you've ever programmed in Sather, you probably know
> that it can get really inconvenient to do really simple
> things. Invariance (aka nonvariance or novariance) is the approach
> taken by C++ and Java.  It usually does pretty well, and is simple to
> implement.  It usually does what you want, and doesn't lead to the
> same runtime type problems covariance does.  However, consider a
> situation like the following:
> 
> class Container: pass
> class LinkedList(Container):
>    def add(self, l :- LinkedList):
>      ...
> 
> class BidirectionalLinkedList(LinkedList):
>    def add(self, l :- ???):
>      ...
> 
> (BTW, I'm going to drop the type of self from here on out, and assume
> that it is always the type of the class in which the method is
> defined.  I don't think there should be a type variable to allow for
> talking about the type of self, such as in the __add__ example in
> Guido's proposal.  Let covariance of parameters do the work... type
> variables can lead to some pretty subtle problems.)
> 
> In the above example, what should the type of l be?  In a
> contravariant language, as the class gets more specific, the
> parameters must get more generic or stay the same.  Therefore,
> implementing BidirectionalLinkedList requires an explicit type cast if
> we want to enforce the natural restriction that you can only add
> another bidirectional linked list on to the end of a bidirectional
> linked list...
> 
> In a covariant language, "BidirectionalLinkedList" would be the right
> answer, and that seems natual.  Choosing to keep your parameter
> invariant is actually fine as well, so you can do:
> 
> class LinkedList:
>   def merge(self, l :- LinkedList):
>     ...
> 
> class BidirectionalLinkedList(LinkedList):
>   def merge(self, l :- LinkedList):
>     ...
> 
> (We don't care whether the parameter is bidirectional or not)
> I'll get back to the problems with this approach in a second.
> 
> Invariance is an answer... the parameter would forever have to be
> LinkedList in all subclasses, but really requires a typecase on the
> parameter, or the clever use of a constrained generic type:
> 
> class LinkedList<T -> LinkedList>: # specifies that the parameter must
> be
>                                    # substitutable with LinkedList.
>    def add(self, l :- T):
>      ...
>    def merge(self, l : LinkedList):
>      ...
> 
> class BidirectionalLinkedList(LinkedList<BidirectionalLinkedList>):
>    def add(self, l):
>      ... # l is of type BidirectionalLinkedList here.
>    def merge(self, l : LinkedList):
>      ...
> 
> I think constrained generic types are good(tm) and should be added.
> This was the approach I was leaning towards last night at the DC
> PIGgies meeting, but I've changed my mind, as, there are some
> reasonable problems with using them here.  If we have a lot of
> parameters of different types that each need to vary, we have to use a
> ton of constrained types.  It can get ugly quickly.  Plus, if we
> wanted to add a feature to a base class that required a covariant
> parameter, we'd have to add a new constrained parameter, which change
> the interface of the class, potentially breaking a lot of code.  I
> think this is bad, and perhaps worse than typecasting, which is
> another solution when you only have invariance.

Have you read about expressing the above with "mytype"?  that is:

interface if_LinkedList:
	def add(self, l :- mytype):
		...
		
class LinkedList(if_LinkedList):
	def add(self, l):
		...

class BiDirectionalLinkedList(LinkedList):
	...

the syntax is fairly simple, and 'mytype' just means the type of
instances of the class that implements the method.
> 
> So what's wrong with covariance?  Here's a common example of the
> problem:
> 
> class Person:
>   def ShareRoom(self, p :- Person)->None:
>     self.room = p.room
> 
> class Boy(Person):
>   pass
> 
> class Girl(Person):
>   pass
> 
> Hmm, the above isn't quite what we want, as the intent is probably to
> keep boys from rooming with girls.  Let's assume we want to allow
> invariant parameters.  We'd have to recode our Boy and Girl classes as
> such:
> 
> class Boy(Person):
>   def ShareRoom(self, p :- Boy)->None:
>     Person.ShareRoom(self, p)
> 
> class Girl(Person):
>   def ShareRoom(self, p :- Girl)->None:
>     Person.ShareRoom(self, p)
> 
> The problem here is that we can now do the following:
> 
> decl p :- Person
> decl g :- Girl
> p = Boy()
> g = Girl()
> g.room = blah
> p.ShareRoom(g)
> 
> The last call there appears type correct... a static type checker will
> say "looks good!" because Person's ShareRoom accepts any person
> object.  At run time, we will dispatch to the Boy's version of
> ShareRoom, which narrows the type, and yields a type error (if we are
> adding dynamic checks).
> 
> One solution is adding anchored types to the language, which is okay,
> but the programmer could still write the above code, and it would
> still be broken.  Anchored types essentially just gives the programmer
> a way to perform the above and get an error on the p.ShareRoom(g) call
> statically, but only if he added additional magic to one of his
> classes.  Anchored types are cool, but I don't think this is the right
> solution for the problem, so I won't cover them right now.
> 
> Meyer's prefered approach is to add a rule that says "polymorphic
> catcalls are invalid".  What's a catcall?  CAT stands for "Changing
> Availability or Type".  In the context of this discussion, a routine
> that is a cat is a routine where the type of the arguments of a
> function vary in a derived class(*).  A catcall is any call to a cat
> method.  A polymorphic catcall is a call to a cat method where the
> target object is polymorphic, which happens in at least 2 cases:
> 
>   1) The object appears in the LHS of an assignment, where the RHS is
>      a subtype of the LHS.
> 
>   2) The object is a formal parameter to a method.
> 
> There might be a third case... I'll have to go look it up.  I don't
> think there is any problem that would make the solution not
> appropriate for Python, though.
> 
> It should be possible to implement this solution, and even do so
> incrementally.  There's another (better, less pessimistic) solution
> called the global validity approach.  The problem with it, IIRC, is
> that the algorithm basically assumes that type checking goes on in a
> "closed world" environment, where you're checking the entire system at
> once.  That probably isn't desirable.  I wonder if there haven't been
> refinements to this algorithm that allow it to work incrementally.
> Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
> rule, assuming that the catcall rule can actually work for Python.

One possible approach to covariance of method parameters is to check
each method of class against all the possible different types of
'self'.  This is what "stick" does, and it finds exactly the cases
where there are type errors.  It does require more checking than a
general rule, and it does add complications to the problem of mixing
checked modules and unchecked ones accross class inheritence, but it
does work.  I'd be interested in any feedback on this approach you
have...

> 
> BTW, return types should be covariant too.
> 
> Syntax for exceptions: If we add exceptions as part of a signature, I
> don't think that I see a good reason to use anything other than
> something similar to Java's "throws" syntax.  I'd add a comma before
> the "throws" for clarity's sake. Here's a right-recursive (LL) grammar
> fragment:
> 
> throws_clause: "," "raises" throws_list;
> throws_list:   identifier more_throws;
> more_throws:   "," throws_list
>                |;
> 
> Example:
> 
> class Client:
>   def getServerVersion(self) -> string, raises ENotConnected, ETimeout:
>       pass
> 
> The problem is that "raises" is a bit wordy.  I just don't like using
> symbols without natural meanings in these situations... who wants to
> be Perl?  *Maybe* a ! would be alright, since there is a very loose
> connection:
> 
> class Client:
>   def getServerVersion(self) -> string ! ENotConnected, ETimeout:
>       pass
> 
> Or perhaps 2 bangs..., but I am really uncomfortable about the
> readability of that syntax for people new to the language.
> 
> Whatever, I'm not too enamored by having exceptions as part of a
> signature.  Part of the reason is variance of exceptions.  I've seen
> cases where people wanted contravariance and it seemed natural.  I
> don't think those situations are all that common, but I do also
> believe that exceptions in signatures can lead to code that is
> massively difficult to maintain as exceptions added to one method end
> up needing to propogate all the way up a call chain through a program.
> You end up with LONG exception lists.  I need to think more about this
> topic, but right now I don't really care whether the programmer
> explicitly lists exceptions on a per-function basis.  The type checker
> can still determine what exceptions don't get caught in the scope of
> what's been checked.  If there is a new build process for static
> checking, it could report all uncaught exceptions at that time
> (probably optionally), and it could dump them to a supplimental file
> when doing incremental checking.  In short, I haven't thought about
> this problem enough recently to say which way I prefer, but I lean
> towards not making exceptions part of signatures.
> 
> Okay, this has been pretty long and rambling, and it's time for me to
> stop writing.  I'm sorry if I didn't make total sense.  I have more to
> say, but it'll have to wait until another day...

I'm waiting :)

thanks for taking the time to write all this.  

scott


> 
> John
> 
> (*) Changing availability: The catcall rule also applies if a derived
> type also chooses to make a feature unavailable to the outside world
> (e.g., by removing the feature).  No one has been proposing a feature
> of the system to do this sort of thing (yet... but visibility
> mechanisms haven't been discussed much so far as I've seen).  Of
> course, you can always muck around with the attribute directly...
> 
> 
> 
> _______________________________________________
> Types-SIG mailing list
> Types-SIG@python.org
> http://www.python.org/mailman/listinfo/types-sig


From John@list.org  Thu Mar 16 19:24:19 2000
From: John@list.org (John Viega)
Date: Thu, 16 Mar 2000 11:24:19 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <20000315212628.A99258@chronis.pobox.com>; from scott on Wed, Mar 15, 2000 at 09:26:28PM -0500
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com>
Message-ID: <20000316112419.D3845@viega.org>

On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote:
> On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote:
> 
> neat paper.  any other references to throw at us?

Tons, more than you would want to read.  I think the most interesting
for you would be:

Jens Palsberg and Michael Schwartzbach.  Object-Oriented Type Systems.
John Wiley and Sons, 1994.  ISBN 0-471-941288

You can also look at some more of the papers I've given students in
the past, some of which are downloadable from:
http://www.list.org/~viega/cs655/

In particular, Unit 6 papers.  The Cardelli and Wegner
paper is worthwhile.  The Milner paper isn't there, and it's pretty
dense anyway.  You'll be interested in Day et. al. (which is a
reasonable starting place for constrained genericity).

Out of the unit 8 papers, I'd probably recommend only Agesen's beyond
what I've already given you.  Abadi and Cardelli's is really dense and
not all that interesting/applicable.  Castagna's is there to see if
students can find the big problems with the work... 

> When we talk about adding runtime checks, it's a little unclear to me
> exactly what is meant.  Are you referring to additional runtime checks
> that the existing dynamic type system in python does not provide?  Is
> leveraging the existing dynamic type system in this way feasible for
> these sorts of checks?  If this is possible, it is one approach I'd
> prefer -- there's no performance hit, and nothing that isn't checked.
> The only drawback I can see to using the existing system as a fallback
> is that it would limit the degree of optimization that is available,
> but I believe that's OK.

Any time you add a dynamic type check that wasn't there before, there
is a performance hit.  I'm just saying that we should try hard to
minimize the number of dynamic checks, period.

> It seems like you are refering to inferencing as a mechanism which can
> both allow the user to denote fewer types and as a means of dealing
> with the mixing of unchecked code with checked code.  

Honestly, without some type annotations, you're not likely to get very
far on the later there.

> With regards to
> meating the first goal, a very limited kind of inferencing is
> possible, where the first assignment to a variable from an expression
> of a given type has the same affect as declaring the variable as that
> type in the first place.  I think that the former goal of reducing the
> number of declarations the programmer must make is attainable with a
> mechanism like this, but the latter goal would require a real
> inferencing algo.

I think it's silly to do a 1/2 assed job with an inference
algorithm... I'd rather not have one at all.  People don't want to
memorize a rule beyond "if the type is ambiguous you must declare it
explicitly".  Honestly, type inferencing has its problems too.  For
example, code can infer a more general type than intended, etc.
However, minimizing the effort of the programmer definitely seems more
pythonesque.

> > A couple of brief thoughts at this point... What should the type
> > checker do with infered types?  Place them directly into the code?
> > Place them in comments or a doc string so that they have no effect but
> > you can see what the checker did and double-check its work?
> 
> maybe just print out inferred types if requested and the type checker
> is a separate program?

I dunno, I always thought it'd be cool if it spit out another copy of
the code that's fully annotated, improving documentation and keeping
the amount of work down.  I actually prefer to see all type
information, myself...


> > I think it would be best to be able to say, "the in type is the out
> > type, and who cares about the type beyond that".  Parametric
> > polymorphism to the rescue... the type could be <a>-><a>, where <a>
> > indicates that x can be of any type.  I think the <a> syntax could be
> > better (I prefer ML's, which types identity as 'a->'a, but I hear
> > there's some resistance to it in this community... I'm going to use 'a
> > from now on because it looks more natural to me).  One problem here is
> > that it probably isn't going to be possible to infer an accurate
> > principal polymorphic type for things in all cases (see above).  I'll
> > have to give the worst cases some thought.
> 
> The stab I have taken at writing a checker allows this kind of
> polymorphism for functions.  It uses ~ instead of '.  Using a prefix
> character is something I prefer as well, though I think there are much
> more important issues to ponder, so I get on with it :)

Use the tick, as it's widely accepted... assume the emacs mode
problems can be fixed :)

I take it you're not trying to infer principal types or anything
complex yet...

> > Another problem that falls out at this point is how to write the type
> > of x when we've seen x call "foo".  Do we look at all classes across
> > the application for ones with a foo method?  And do we care if that
> > solutions precluds classes that dynamically add their "foo" method
> > from applying?  Ick.  I'd prefer to avoid the concrete, and infer
> > something like:
> > 
> > << foo: None->'a >>
> > 
> > Which would read: an object with a field foo of type function taking
> > no arguments and returning anything.
> 
> Do you think it's ok to require that the programmer declare something
> about x in this case?  If something like what you suggest is inferred
> there, it seems like a class of errors might slip through that we
> might not want to allow to slip.  


That's true with type inferencing, period.  I think if we're going to
have it, we should stick to regular rules instead of special casing
stuff like this.  I don't think that it's going to end up being a huge
source of problems anyway.

> 
> contstrained polymorphism seems to solve lots of more complex static
> type issues in clean ways.  Do you know of any references on
> implementations for constrained polymorphism?  I'd like to look more
> closely at how it's done elsewhere before taking a crack at it myself.

An okay place to start is with the Day et al. paper on the website I
gave you above.

> I like the idea of discouraging OR'ing of types as much as possible.
> There are two cases where I think it could come in very handy:  first,
> if the static type of None is considered a distinct type rather
> something like a null type, then there are lots of things that return
> or produce type-of(None) OR something else.  Examples are dict.get and
> default arguments.   After wrestling with this for a while, I've come
> to think that the introduction of a null-type in a static type system
> is more manageable for the programmer.  Do you see other ways of
> dealing with that or how would you prefer that those things are
> handled?  The other case is recursive data types such as trees, where
> a node can contain either other nodes or leaves.

I disagree with you here.  

For your first case, "None type": Many languages have void
single-valued types without any need for OR-ing types.  Remember, a
type specifies the universe of possible values.  For all object types,
None is a valid value in that set of values.  The void type is the set
that only contains the value None.  All object types are subtypes of
the void type, and you get all the benefits of subtyping polymorphism.
I don't see any problem of the sort you're talking about here, at all.

For your second case, modling a tree where a node contains either
other nodes or leaves: There are far better ways to model the problem.
First, most trees don't have nodes without values, but let's ignore
that for a minute, and assume otherwise.  The natural way to model
this problem is with subtyping polymorphism, not with the OR-ing of
types:

class NodeBase:  # Theoretically abstract.
  def print_tree(self): pass

class NonLeafNode(NodeBase):
  left :- NodeBase
  right :- NodeBase
  def print_tree(self):
    left.print_tree()
    right.print_tree()

typedef printable << print()-> None >>
class LeafNode< T -> printable >(NodeBase):
  value :- T
  def print_tree(self):
    value.print()

"T -> printable" Should read something like "any type T that is
printable" (constrained genericity).

I still assert that OR-ing types should *not* be in a python type
system.  You're basically saying, "here are things that should require
a runtime cast, but we're going to completely ignore that statically
and dynamically".

When are dynamic checks necessary?  Generally, you're trying to do
something that can be written as an assignment.  Since types are
essentially sets, the LHS has to be a subset of the RHS in order for
us to make the determination that an assignment will always yield an
object of a legal type.  If the LHS and RHS are disjoint (well, if
None is the only shared value w/ object types), that should never be
possible.  If there is some overlap, then a dynamic cast is required.
I'd *really* like to see it be the case that the only times that "foo"
is in the same set of values as 12 is when parametric polymorphism is
involved, and with the any type.

I am aware that not allowing OR'd types makes things a bit harder for
legacy code that people want to change to use the type system (such as
the standard library).  The one place where it's a big issue is with
heterogenous lists.  However, I think it really reduces the power of a
type system to allow a list to be typed (string|integer|file), etc.


> > 
> > decl z :- integer
> > typecase a:
> >   case (x :- integer, y :- integer) : z = x + y
> >   case (x :- integer,)              : z = x
> >   case x :- integer                 : z = x
> > 
> > That's not quite as bad.
> 
> With regards to type casing possibly causing the language to slow down
> by bringing the static type information into runtime, do you think it
> would be reasonable to allow typecasing only on types that are easily
> expressible in terms of the existing dynamic type system?  It seems to
> me that this approach would save a lot of work, limit the runtime
> overhead, and discourage OR'ing all at once. 

Well, any time you have to dynamically cast there's going to be a
performance hit.  I'm not really worried about matching, though.  You
can do it fairly efficiently, plus it satisfies the principal of
localized cost... the feature costs the programmer nothing unless he
uses it.  Also, I think that our most important goal here is to design
the right type system for python, not to pick one that is very easy to
implement...


> 
> Have you read about expressing the above with "mytype"?  that is:
> 
> interface if_LinkedList:
> 	def add(self, l :- mytype):
> 		...
> 		
> class LinkedList(if_LinkedList):
> 	def add(self, l):
> 		...
> 
> class BiDirectionalLinkedList(LinkedList):
> 	...
> 
> the syntax is fairly simple, and 'mytype' just means the type of
> instances of the class that implements the method.

Well, first of all, let me say that covariance provides a much more
elegent solution to the problem.  Type variables are not as easy for
the average programmer to understand.  If type variables are done
right, then they'd basically duplicate language features like
genericity.  No one wants to have non-orthogonal constructs, so we'd
probably remove genericity, which would result in a type system that's
more difficult to explain and use, plus not nearly as well understood.

Another problem with using type variables to solve this problem is
that it requires the programmer to anticipate how people will want to
use their classes upfront.  If you don't happen to use a type variable
the first time you specify a parameter, derived classes cannot change
the variance of the parameters without going back and modifying the
original code.  Plus, you guys haven't talked about any type variables
except "mytype", which is not powerful enough to handle uses of
covariance where the argument to a method is of some type other than
the type for which the method is a member.

> > It should be possible to implement this solution, and even do so
> > incrementally.  There's another (better, less pessimistic) solution
> > called the global validity approach.  The problem with it, IIRC, is
> > that the algorithm basically assumes that type checking goes on in a
> > "closed world" environment, where you're checking the entire system at
> > once.  That probably isn't desirable.  I wonder if there haven't been
> > refinements to this algorithm that allow it to work incrementally.
> > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
> > rule, assuming that the catcall rule can actually work for Python.
> 
> One possible approach to covariance of method parameters is to check
> each method of class against all the possible different types of
> 'self'.  This is what "stick" does, and it finds exactly the cases
> where there are type errors.  It does require more checking than a
> general rule, and it does add complications to the problem of mixing
> checked modules and unchecked ones accross class inheritence, but it
> does work.  I'd be interested in any feedback on this approach you
> have...

To get it right, you would essentially be doing the same thing that
the global validity approach does.  In particular, you have the exact
same problems in that a "closed world" assumption is required.
Incremental checking is far more useful, and I think that the
polymorphic catcall rule is simple enough (though not if you call it
"polymorphic catcall" when you report an error to the end user!).

John


From scott@chronis.pobox.com  Thu Mar 16 21:06:12 2000
From: scott@chronis.pobox.com (scott)
Date: Thu, 16 Mar 2000 16:06:12 -0500
Subject: [Types-sig] A late entry
In-Reply-To: <20000316112419.D3845@viega.org>; from John@list.org on Thu, Mar 16, 2000 at 11:24:19AM -0800
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org>
Message-ID: <20000316160612.A11488@chronis.pobox.com>

On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote:
> On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote:
> > On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote:
> > 
> > neat paper.  any other references to throw at us?
> 
> Tons, more than you would want to read.  I think the most interesting
> for you would be:
> 
> Jens Palsberg and Michael Schwartzbach.  Object-Oriented Type Systems.
> John Wiley and Sons, 1994.  ISBN 0-471-941288
> 
> You can also look at some more of the papers I've given students in
> the past, some of which are downloadable from:
> http://www.list.org/~viega/cs655/
> 
> In particular, Unit 6 papers.  The Cardelli and Wegner
> paper is worthwhile.  The Milner paper isn't there, and it's pretty
> dense anyway.  You'll be interested in Day et. al. (which is a
> reasonable starting place for constrained genericity).
> 
> Out of the unit 8 papers, I'd probably recommend only Agesen's beyond
> what I've already given you.  Abadi and Cardelli's is really dense and
> not all that interesting/applicable.  Castagna's is there to see if
> students can find the big problems with the work... 


I guess I need to set aside some time to read :)

> 
> > When we talk about adding runtime checks, it's a little unclear to me
> > exactly what is meant.  Are you referring to additional runtime checks
> > that the existing dynamic type system in python does not provide?  Is
> > leveraging the existing dynamic type system in this way feasible for
> > these sorts of checks?  If this is possible, it is one approach I'd
> > prefer -- there's no performance hit, and nothing that isn't checked.
> > The only drawback I can see to using the existing system as a fallback
> > is that it would limit the degree of optimization that is available,
> > but I believe that's OK.
> 
> Any time you add a dynamic type check that wasn't there before, there
> is a performance hit.  I'm just saying that we should try hard to
> minimize the number of dynamic checks, period.

I don't get the feeling I was clear enough about what I'm saying re:
dynamic checks, so I'll try again.  Minimizing run time checks is
something I very much agree with.  I also agree that some are likely
to be necessary.  I think that where those checks are necessary, it
seems to make sense to leverage python's existing type system to
implement them, because that type system is already in place and there
would be no need for python objects as they exist in C or java or
whatever to carry around any additional information.

For example, if we add a field to the list structure in C Python that
contains the set of types contained in the list, then every time a del
somelist[x] occured, the extra information would have to be updated by
potentially checking the entire list.  If there was a way to make
runtime checks work reasonably without this kind of extra weight, it
seems worth pursueing to me.  One way that seems feasible is to
leverage the existing type system in python.  

For example, if we provided a hierarchical interface to the existing
type system, and that hierarchy were mirrored in the static system,
then dynamic checks or casts could be limited to those expressible in
the hierarchy based on python's existing type system (where you can
compare lists and tuples but not lists of ints and lists of strings).

I hope that clarifies the idea...

> 
> > It seems like you are refering to inferencing as a mechanism which can
> > both allow the user to denote fewer types and as a means of dealing
> > with the mixing of unchecked code with checked code.  
> 
> Honestly, without some type annotations, you're not likely to get very
> far on the later there.

Of course :)  I didn't mean to imply otherwise at all.

> 
> > With regards to
> > meating the first goal, a very limited kind of inferencing is
> > possible, where the first assignment to a variable from an expression
> > of a given type has the same affect as declaring the variable as that
> > type in the first place.  I think that the former goal of reducing the
> > number of declarations the programmer must make is attainable with a
> > mechanism like this, but the latter goal would require a real
> > inferencing algo.
> 
> I think it's silly to do a 1/2 assed job with an inference
> algorithm... I'd rather not have one at all.  People don't want to
> memorize a rule beyond "if the type is ambiguous you must declare it
> explicitly".  Honestly, type inferencing has its problems too.  For
> example, code can infer a more general type than intended, etc.
> However, minimizing the effort of the programmer definitely seems more
> pythonesque.

In my own experience, I've never seen a type inferencer that succeeded
at only requiring type annotations where they would otherwise be
ambiguous, that includes ML.  I spent more time guessing what the type
inferencer considered ambiguous than anything else.  The rule of
thumb for where annotations are required would be easier for me to get
if it didn't involve potentially vague notions like deciding what is
and is not ambiguous.  That's just me, though.

[...]
> 
> I take it you're not trying to infer principal types or anything
> complex yet...

No, deduction would be a more accurate term.

> 
> > > Another problem that falls out at this point is how to write the type
> > > of x when we've seen x call "foo".  Do we look at all classes across
> > > the application for ones with a foo method?  And do we care if that
> > > solutions precluds classes that dynamically add their "foo" method
> > > from applying?  Ick.  I'd prefer to avoid the concrete, and infer
> > > something like:
> > > 
> > > << foo: None->'a >>
> > > 
> > > Which would read: an object with a field foo of type function taking
> > > no arguments and returning anything.
> > 
> > Do you think it's ok to require that the programmer declare something
> > about x in this case?  If something like what you suggest is inferred
> > there, it seems like a class of errors might slip through that we
> > might not want to allow to slip.  
> 
> 
> That's true with type inferencing, period.  I think if we're going to
> have it, we should stick to regular rules instead of special casing
> stuff like this.  I don't think that it's going to end up being a huge
> source of problems anyway.

One of things expressed earlier was the idea of leaving the
implementation of inferencing until after the type system and checker
were done.  Does that order of events seem reasonable to you?

> > I like the idea of discouraging OR'ing of types as much as possible.
> > There are two cases where I think it could come in very handy:  first,
> > if the static type of None is considered a distinct type rather
> > something like a null type, then there are lots of things that return
> > or produce type-of(None) OR something else.  Examples are dict.get and
> > default arguments.   After wrestling with this for a while, I've come
> > to think that the introduction of a null-type in a static type system
> > is more manageable for the programmer.  Do you see other ways of
> > dealing with that or how would you prefer that those things are
> > handled?  The other case is recursive data types such as trees, where
> > a node can contain either other nodes or leaves.
> 
> I disagree with you here.  

The way I read what you say below, we're actually agreeing about
having a special type for the value None, it seems to work best to me
as a valid value in the set of values of any object type.  That's what
I meant by 'something like a null type' above.  By doing this, you
lose the ability of a type checker to distinguish when something
should be None and when it should not, but this approach makes lots of
things easier both for the programmer and the implementation of a
static type system.
> 
> For your first case, "None type": Many languages have void
> single-valued types without any need for OR-ing types.  Remember, a
> type specifies the universe of possible values.  For all object types,
> None is a valid value in that set of values.  The void type is the set
> that only contains the value None.  All object types are subtypes of
> the void type, and you get all the benefits of subtyping polymorphism.
> I don't see any problem of the sort you're talking about here, at all.
> 
> For your second case, modling a tree where a node contains either
> other nodes or leaves: There are far better ways to model the problem.
> First, most trees don't have nodes without values, but let's ignore
> that for a minute, and assume otherwise.  The natural way to model
> this problem is with subtyping polymorphism, not with the OR-ing of
> types:
> 
> class NodeBase:  # Theoretically abstract.
>   def print_tree(self): pass
> 
> class NonLeafNode(NodeBase):
>   left :- NodeBase
>   right :- NodeBase
>   def print_tree(self):
>     left.print_tree()
>     right.print_tree()
> 
> typedef printable << print()-> None >>
> class LeafNode< T -> printable >(NodeBase):
>   value :- T
>   def print_tree(self):
>     value.print()
> 
> "T -> printable" Should read something like "any type T that is
> printable" (constrained genericity).

This seems like a good approach, and if None is treated specially as
above, then recursive types such as:

typedef IntTree (IntTree, int, IntTree)

aren't a problem either (atleast in terms of the need for OR).

> 
> I still assert that OR-ing types should *not* be in a python type
> system.


>You're basically saying, "here are things that should require
> a runtime cast, but we're going to completely ignore that statically
> and dynamically".

huh?  You mean you feel that OR-ing creates this situation?  In the
worst case, I agree.  I also have been searching for ways to eliminate
or atleast reduce OR-ing myself.  It seems essentially bad for static
systems.

> 
> When are dynamic checks necessary?  Generally, you're trying to do
> something that can be written as an assignment.  Since types are
> essentially sets, the LHS has to be a subset of the RHS in order for
> us to make the determination that an assignment will always yield an
> object of a legal type.  

You mean RHS must be a subset of the LHS, right?

as in 

x :- numeric
y :- int
y = 1 
x = y  # this is ok, int is subset of numeric

> If the LHS and RHS are disjoint (well, if
> None is the only shared value w/ object types), that should never be
> possible.  If there is some overlap, then a dynamic cast is required.
> I'd *really* like to see it be the case that the only times that "foo"
> is in the same set of values as 12 is when parametric polymorphism is
> involved, and with the any type.

<chuckle> 
I don't think many will disagree with you there.
> 
> I am aware that not allowing OR'd types makes things a bit harder for
> legacy code that people want to change to use the type system (such as
> the standard library).  The one place where it's a big issue is with
> heterogenous lists.  However, I think it really reduces the power of a
> type system to allow a list to be typed (string|integer|file), etc.
> 
> 
> > > 
> > > decl z :- integer
> > > typecase a:
> > >   case (x :- integer, y :- integer) : z = x + y
> > >   case (x :- integer,)              : z = x
> > >   case x :- integer                 : z = x
> > > 
> > > That's not quite as bad.
> > 
> > With regards to type casing possibly causing the language to slow down
> > by bringing the static type information into runtime, do you think it
> > would be reasonable to allow typecasing only on types that are easily
> > expressible in terms of the existing dynamic type system?  It seems to
> > me that this approach would save a lot of work, limit the runtime
> > overhead, and discourage OR'ing all at once. 
> 
> Well, any time you have to dynamically cast there's going to be a
> performance hit.  I'm not really worried about matching, though.  You
> can do it fairly efficiently, plus it satisfies the principal of
> localized cost... the feature costs the programmer nothing unless he
> uses it.  

This is true so long as extra information isn't carried around and
kept up to date at run time in order to make matching more efficient. 

> Also, I think that our most important goal here is to design
> the right type system for python, not to pick one that is very easy to
> implement...

agreed.  There seems to be an initially steep learning curve to
designing type systems.  I know it's been that way for me.  At some
point in the near future I hope to have read and learned enough to
take another stab at implementing an optional static type system for
python.

> 
> 
> 
> > 
> > Have you read about expressing the above with "mytype"?  that is:
> > 
> > interface if_LinkedList:
> > 	def add(self, l :- mytype):
> > 		...
> > 		
> > class LinkedList(if_LinkedList):
> > 	def add(self, l):
> > 		...
> > 
> > class BiDirectionalLinkedList(LinkedList):
> > 	...
> > 
> > the syntax is fairly simple, and 'mytype' just means the type of
> > instances of the class that implements the method.
> 
> Well, first of all, let me say that covariance provides a much more
> elegent solution to the problem.  Type variables are not as easy for
> the average programmer to understand.  If type variables are done
> right, then they'd basically duplicate language features like
> genericity.  No one wants to have non-orthogonal constructs, so we'd
> probably remove genericity, which would result in a type system that's
> more difficult to explain and use, plus not nearly as well understood.
> 
> Another problem with using type variables to solve this problem is
> that it requires the programmer to anticipate how people will want to
> use their classes upfront.  If you don't happen to use a type variable
> the first time you specify a parameter, derived classes cannot change
> the variance of the parameters without going back and modifying the
> original code.  Plus, you guys haven't talked about any type variables
> except "mytype", which is not powerful enough to handle uses of
> covariance where the argument to a method is of some type other than
> the type for which the method is a member.

time to take a closer look at constrained parametric polymorphism :)

> 
> > > It should be possible to implement this solution, and even do so
> > > incrementally.  There's another (better, less pessimistic) solution
> > > called the global validity approach.  The problem with it, IIRC, is
> > > that the algorithm basically assumes that type checking goes on in a
> > > "closed world" environment, where you're checking the entire system at
> > > once.  That probably isn't desirable.  I wonder if there haven't been
> > > refinements to this algorithm that allow it to work incrementally.
> > > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall
> > > rule, assuming that the catcall rule can actually work for Python.
> > 
> > One possible approach to covariance of method parameters is to check
> > each method of class against all the possible different types of
> > 'self'.  This is what "stick" does, and it finds exactly the cases
> > where there are type errors.  It does require more checking than a
> > general rule, and it does add complications to the problem of mixing
> > checked modules and unchecked ones accross class inheritence, but it
> > does work.  I'd be interested in any feedback on this approach you
> > have...
> 
> To get it right, you would essentially be doing the same thing that
> the global validity approach does.  In particular, you have the exact
> same problems in that a "closed world" assumption is required.
> Incremental checking is far more useful, and I think that the
> polymorphic catcall rule is simple enough (though not if you call it
> "polymorphic catcall" when you report an error to the end user!).

I'll look into that more, as well as potential means for making the
global validity approach work more incrementally.

thanks,

scott


From bwarsaw@cnri.reston.va.us  Thu Mar 16 21:36:12 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 16 Mar 2000 16:36:12 -0500 (EST)
Subject: [Types-sig] A late entry
References: <38CEC33C.6AC199A1@viega.org>
 <20000315212628.A99258@chronis.pobox.com>
 <20000316112419.D3845@viega.org>
Message-ID: <14545.21452.817822.231182@anthem.cnri.reston.va.us>

>>>>> "JV" == John Viega <John@list.org> writes:

    JV> Use the tick, as it's widely accepted... assume the emacs mode
    JV> problems can be fixed :)

I'll just pipe in here.  :)

If you use a tick, you will break python-mode and I predict that it
will never be fixed, because what's really happening is that you're
breaking some fundamental assumptions that X/Emacs makes about code.
Trust me on this.  Why do you think Perl added `::' ?  Not /just/ to
make C++ programmers more comfortable.

-Barry


From John@list.org  Thu Mar 16 22:00:56 2000
From: John@list.org (John Viega)
Date: Thu, 16 Mar 2000 14:00:56 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <20000316160612.A11488@chronis.pobox.com>; from scott on Thu, Mar 16, 2000 at 04:06:12PM -0500
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com>
Message-ID: <20000316140056.F3845@viega.org>

On Thu, Mar 16, 2000 at 04:06:12PM -0500, scott wrote:
> On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote:
> 
> I don't get the feeling I was clear enough about what I'm saying re:
> dynamic checks, so I'll try again.  Minimizing run time checks is
> something I very much agree with.  I also agree that some are likely
> to be necessary.  I think that where those checks are necessary, it
> seems to make sense to leverage python's existing type system to
> implement them, because that type system is already in place and there
> would be no need for python objects as they exist in C or java or
> whatever to carry around any additional information.

I didn't disagree with this.  It seems rather obvious to try to avoid
duplicating work if it's not necessary.

> For example, if we add a field to the list structure in C Python that
> contains the set of types contained in the list, then every time a del
> somelist[x] occured, the extra information would have to be updated by
> potentially checking the entire list.  If there was a way to make
> runtime checks work reasonably without this kind of extra weight, it
> seems worth pursueing to me.  One way that seems feasible is to
> leverage the existing type system in python.  

Ack!  Set of types bad!  The only problem with disallowing sets of
types (OR-ing types) is that legacy code will either need to be
rewritten or use "any", where it might be nice to get a bit more
accurate than "any".  I don't think it's worth supporting the feature
for this reason alone, as it has way too many bad consequences, and
the good consequences aren't that good.

> For example, if we provided a hierarchical interface to the existing
> type system, and that hierarchy were mirrored in the static system,
> then dynamic checks or casts could be limited to those expressible in
> the hierarchy based on python's existing type system (where you can
> compare lists and tuples but not lists of ints and lists of strings).

Oh, I don't think I agree with you here.  I think it's fine to
leverage off the existing type system where possible, but you need to
be able to add dynamic checks any time you can determine statically
that something may or may not need to give a type error at runtime.
If the dynamic checks needed can't be expressed in Python's dynamic
type system currently, then the dynamic system will have to be
changed.  

I wouldn't worry too much about this problem, though.  If you have a
good static type system, the number of dynamic checks that get added
will generally be pretty low.  The code in question can be added
without changes to Python's current type system.
> > 
> > I think it's silly to do a 1/2 assed job with an inference
> > algorithm... I'd rather not have one at all.  People don't want to
> > memorize a rule beyond "if the type is ambiguous you must declare it
> > explicitly".  Honestly, type inferencing has its problems too.  For
> > example, code can infer a more general type than intended, etc.
> > However, minimizing the effort of the programmer definitely seems more
> > pythonesque.
> 
> In my own experience, I've never seen a type inferencer that succeeded
> at only requiring type annotations where they would otherwise be
> ambiguous, that includes ML.  I spent more time guessing what the type
> inferencer considered ambiguous than anything else.  The rule of
> thumb for where annotations are required would be easier for me to get
> if it didn't involve potentially vague notions like deciding what is
> and is not ambiguous.  That's just me, though.

ML is unification-based.  Depending on what kind of implementation of
unification is used, there can be some bizzare corner cases (the
correct implementation is harder/less efficient).  But generally, if
the code you wrote was actually unambiguous, then ML can infer the
principal type.

In that respect, I do not believe you are correct.  However, I will
agree that it can take people a while to develop a good mental model
of what an inferencer is actually going to deduce.  Like I said
before, I'm perfectly comfortable with requiring explicit types in
order to type check a program.  But there seems to be some interest in
having the automation, even despite the drawbacks.  If that is the
case, I'd rather see something powerful and well-designed with few
restrictions than something ad hoc with many restrictions.


> One of things expressed earlier was the idea of leaving the
> implementation of inferencing until after the type system and checker
> were done.  Does that order of events seem reasonable to you?

Well, if you design things right, your inferencer can leverage some of
the infrastructure of your checker quite effectively.  It's definitely
the prefered ordering.


> The way I read what you say below, we're actually agreeing about
> having a special type for the value None, it seems to work best to me
> as a valid value in the set of values of any object type.  That's what
> I meant by 'something like a null type' above.  By doing this, you
> lose the ability of a type checker to distinguish when something
> should be None and when it should not, but this approach makes lots of
> things easier both for the programmer and the implementation of a
> static type system.

No, most languages have a rule that variables cannot have the void
type as their principal type.  This is no reason to allow OR-ing of
types.  Note how it isn't an issue in pretty much every other
language, either.  So I still don't see what you are seeing that
forces an OR construct.


> > For your second case, modling a tree where a node contains either
> > other nodes or leaves: There are far better ways to model the problem.
> > First, most trees don't have nodes without values, but let's ignore
> > that for a minute, and assume otherwise.  The natural way to model
> > this problem is with subtyping polymorphism, not with the OR-ing of
> > types:
> > 
> > class NodeBase:  # Theoretically abstract.
> >   def print_tree(self): pass
> > 
> > class NonLeafNode(NodeBase):
> >   left :- NodeBase
> >   right :- NodeBase
> >   def print_tree(self):
> >     left.print_tree()
> >     right.print_tree()
> > 
> > typedef printable << print()-> None >>
> > class LeafNode< T -> printable >(NodeBase):
> >   value :- T
> >   def print_tree(self):
> >     value.print()
> > 
> > "T -> printable" Should read something like "any type T that is
> > printable" (constrained genericity).
> 
> This seems like a good approach, and if None is treated specially as
> above, then recursive types such as:
> 
> typedef IntTree (IntTree, int, IntTree)
> 
> aren't a problem either (atleast in terms of the need for OR).

I got confused here for a second... this looks too much like an actual
tuple use for me. :) Whatever syntax ends up getting used, can we use
(x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2
of z?  That's a very common notation.


> > I still assert that OR-ing types should *not* be in a python type
> > system.
> 
> 
> >You're basically saying, "here are things that should require
> > a runtime cast, but we're going to completely ignore that statically
> > and dynamically".
> 
> huh?  You mean you feel that OR-ing creates this situation?  In the
> worst case, I agree.  I also have been searching for ways to eliminate
> or atleast reduce OR-ing myself.  It seems essentially bad for static
> systems.

Yes, OR-ing creates that situation.  I've definitely been arguing that
it makes your static checking less precise, forcing many type errors
to be caught at runtime.  I think that "any" should be the only shady
construct here, personally.  It should suffice for supporting code
that was untyped as written.  There is no need for an OR construct, it
can and should be eliminated.  I'll be very disappointed if it makes
it into the Python type system :)

> 
> > 
> > When are dynamic checks necessary?  Generally, you're trying to do
> > something that can be written as an assignment.  Since types are
> > essentially sets, the LHS has to be a subset of the RHS in order for
> > us to make the determination that an assignment will always yield an
> > object of a legal type.  
> 
> You mean RHS must be a subset of the LHS, right?

Of course; my typo.

> > Well, any time you have to dynamically cast there's going to be a
> > performance hit.  I'm not really worried about matching, though.  You
> > can do it fairly efficiently, plus it satisfies the principal of
> > localized cost... the feature costs the programmer nothing unless he
> > uses it.  
> 
> This is true so long as extra information isn't carried around and
> kept up to date at run time in order to make matching more efficient. 

Well, you're going to want to keep complete type information around
for the runtime to use anyway, so that doesn't really matter.

> > Well, first of all, let me say that covariance provides a much more
> > elegent solution to the problem.  Type variables are not as easy for
> > the average programmer to understand.  If type variables are done
> > right, then they'd basically duplicate language features like
> > genericity.  No one wants to have non-orthogonal constructs, so we'd
> > probably remove genericity, which would result in a type system that's
> > more difficult to explain and use, plus not nearly as well understood.
> > 
> > Another problem with using type variables to solve this problem is
> > that it requires the programmer to anticipate how people will want to
> > use their classes upfront.  If you don't happen to use a type variable
> > the first time you specify a parameter, derived classes cannot change
> > the variance of the parameters without going back and modifying the
> > original code.  Plus, you guys haven't talked about any type variables
> > except "mytype", which is not powerful enough to handle uses of
> > covariance where the argument to a method is of some type other than
> > the type for which the method is a member.
> 
> time to take a closer look at constrained parametric polymorphism :)

This isn't the best solution, IMHO.  I advocate covariance + a poly
catcall rule.

> > To get it right, you would essentially be doing the same thing that
> > the global validity approach does.  In particular, you have the exact
> > same problems in that a "closed world" assumption is required.
> > Incremental checking is far more useful, and I think that the
> > polymorphic catcall rule is simple enough (though not if you call it
> > "polymorphic catcall" when you report an error to the end user!).
> 
> I'll look into that more, as well as potential means for making the
> global validity approach work more incrementally.

There's been some work done in this area, but nothing that had
actually been implemented last I checked.  Why is it necessary?  What
do you have against a poly catcall rule?

John


From John@list.org  Thu Mar 16 22:12:31 2000
From: John@list.org (John Viega)
Date: Thu, 16 Mar 2000 14:12:31 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <14545.21452.817822.231182@anthem.cnri.reston.va.us>; from Barry A. Warsaw on Thu, Mar 16, 2000 at 04:36:12PM -0500
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <14545.21452.817822.231182@anthem.cnri.reston.va.us>
Message-ID: <20000316141231.G3845@viega.org>

As far as I know, the assumption is that you won't be able to
approximate the grammar with regexp-based matching.  You already can't
do it perfectly, of course.  In practice, you just have to change your
regular expressions around in contexts where people can specify a
type.  For example, if you looked at a def line and just said "find
all quote pairs and format the crap inbetween", you'd have to get more
complex, "find the next argument, and format pairs of ticks, if a pair
is found up to a :-".

I haven't ever written a font-lock mode, so I don't know what the
interface to emacs primitives for this stuff looks like.  And thus, I
might be wrong, it may turn out to be really difficult.  It's
definitely possible, though, even if font-lock had to essentially be
recreated from scratch with better primitives :)

Now that might not be worth the effort, but I'd like to assume that
the emacs problems can be fixed and are worth fixing to gain a syntax
that's more natural to people who are actually familiar with this
stuff.  At least, let's please do that for the sake of discussing
these concepts in this thread, because I get confused very easily :)

John


On Thu, Mar 16, 2000 at 04:36:12PM -0500, Barry A. Warsaw wrote:
> 
> >>>>> "JV" == John Viega <John@list.org> writes:
> 
>     JV> Use the tick, as it's widely accepted... assume the emacs mode
>     JV> problems can be fixed :)
> 
> I'll just pipe in here.  :)
> 
> If you use a tick, you will break python-mode and I predict that it
> will never be fixed, because what's really happening is that you're
> breaking some fundamental assumptions that X/Emacs makes about code.
> Trust me on this.  Why do you think Perl added `::' ?  Not /just/ to
> make C++ programmers more comfortable.
> 
> -Barry


From scott@chronis.pobox.com  Thu Mar 16 22:51:08 2000
From: scott@chronis.pobox.com (scott)
Date: Thu, 16 Mar 2000 17:51:08 -0500
Subject: [Types-sig] A late entry
In-Reply-To: <20000316140056.F3845@viega.org>; from John@list.org on Thu, Mar 16, 2000 at 02:00:56PM -0800
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com> <20000316140056.F3845@viega.org>
Message-ID: <20000316175108.C14723@chronis.pobox.com>

On Thu, Mar 16, 2000 at 02:00:56PM -0800, John Viega wrote:
> On Thu, Mar 16, 2000 at 04:06:12PM -0500, scott wrote:
> > On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote:

> > The way I read what you say below, we're actually agreeing about
> > having a special type for the value None, it seems to work best to me
> > as a valid value in the set of values of any object type.  That's what
> > I meant by 'something like a null type' above.  By doing this, you
> > lose the ability of a type checker to distinguish when something
> > should be None and when it should not, but this approach makes lots of
> > things easier both for the programmer and the implementation of a
> > static type system.
> 
> No, most languages have a rule that variables cannot have the void
> type as their principal type.  This is no reason to allow OR-ing of
> types.  Note how it isn't an issue in pretty much every other
> language, either.  So I still don't see what you are seeing that
> forces an OR construct.

I'm not saying anything forces an OR construct, or even that one is a
good idea.  I'm just trying to get the implications straight.  One of
the implications of treating the type of None as a principal type is
that a static type checker will be able to say "hey x might be None,
but you're assuming it's a string!"  in code like the following:

x = {'foo': 'bar'}.get('baz')
x = x + ''

That's a good thing, and allowing the type of x to be 'None | string'
combined with the necessity of typecasing the result is one way of
having a static type system understand this error.

But, as you argue, there are lots of tradeoffs to consider with
allowing OR's.  I personally don't think that it's worth it to
introduce OR's for cases like this (I used to, but trying to build a
static type system with this construct made me change my mind).  The
idea does have it's proponents, so it's definitely worth mentioning
that there is a tradeoff to be considered.

> > This seems like a good approach, and if None is treated specially as
> > above, then recursive types such as:
> > 
> > typedef IntTree (IntTree, int, IntTree)
> > 
> > aren't a problem either (atleast in terms of the need for OR).
> 
> I got confused here for a second... this looks too much like an actual
> tuple use for me. :) Whatever syntax ends up getting used, can we use
> (x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2
> of z?  That's a very common notation.

hmm.  I don't want to get into syntax wars.  I'll use whatever syntax
you like for this discussion.  The reason I used the above syntax is
that it was proposed and used lots in previous discussions.  

[...]
> 
> > > Well, any time you have to dynamically cast there's going to be a
> > > performance hit.  I'm not really worried about matching, though.  You
> > > can do it fairly efficiently, plus it satisfies the principal of
> > > localized cost... the feature costs the programmer nothing unless he
> > > uses it.  
> > 
> > This is true so long as extra information isn't carried around and
> > kept up to date at run time in order to make matching more efficient. 
> 
> Well, you're going to want to keep complete type information around
> for the runtime to use anyway, so that doesn't really matter.

yikes.  I was hoping to avoid that, as it implies a major leap in
difficulty.  It also seems like making this info available at runtime
may imply that a static type system isn't really feasible until
PY3000, which is a little disappointing.  We'll see, I guess :)


> > > To get it right, you would essentially be doing the same thing that
> > > the global validity approach does.  In particular, you have the exact
> > > same problems in that a "closed world" assumption is required.
> > > Incremental checking is far more useful, and I think that the
> > > polymorphic catcall rule is simple enough (though not if you call it
> > > "polymorphic catcall" when you report an error to the end user!).
> > 
> > I'll look into that more, as well as potential means for making the
> > global validity approach work more incrementally.
> 
> There's been some work done in this area, but nothing that had
> actually been implemented last I checked.  Why is it necessary? 

I'm just trying to understand the options, I don't have the knowledge
to rule a whole lot of things out at this point, and I feel the need
to understand things well enough to make those calls for myself.

>What
> do you have against a poly catcall rule?

Nothing except that I don't know enough about it yet.  Just found a
good description at
http://www.eiffel.com/doc/manuals/technology/typing/paper/page.html

like I said before, I need to set aside more time to read!

scott


From bwarsaw@cnri.reston.va.us  Thu Mar 16 22:58:56 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Thu, 16 Mar 2000 17:58:56 -0500 (EST)
Subject: [Types-sig] A late entry
References: <38CEC33C.6AC199A1@viega.org>
 <20000315212628.A99258@chronis.pobox.com>
 <20000316112419.D3845@viega.org>
 <14545.21452.817822.231182@anthem.cnri.reston.va.us>
 <20000316141231.G3845@viega.org>
Message-ID: <14545.26416.628396.276382@anthem.cnri.reston.va.us>

font-lock is only one of the problems, and even it is only partially
driven by regexps.  There are C primitives that handle things like
parsing over a string, comment, or s-expression.  These cannot be
taught that a non-embedded unescaped tick opens a string sometimes but
not other times.  Having actually implemented support for dual-comment
styles in a single buffer (i.e. /*...*/ and //...\n) I can tell you
that this stuff is really really tricky.

So somebody (not me :) is either going to have to rewrite the syntax
parsing model and primitives from scratch, or throw out anything that
actually uses the primitives.  Font-locking is one thing, but there
may be more subtle breakages.  I have no idea how well cperl-mode
handles this stuff, but they've already been down that road.

You could also argue that Py3K shouldn't have to cater to a 20 year
old technology like Emacs, and you'd probably be right.  I'd still
grumble though :) I'd also be interested in seeing what the IDLE
developers think about such syntax changes.

My prediction stands: it'll never get done, even if it were possible.
Meaning, I really don't think it's worth the effort, and I can't
imagine anybody actually spending the time to do it.

>>>>> "JV" == John Viega <John@list.org> writes:

    JV> Now that might not be worth the effort, but I'd like to assume
    JV> that the emacs problems can be fixed and are worth fixing to
    JV> gain a syntax that's more natural to people who are actually
    JV> familiar with this stuff.  At least, let's please do that for
    JV> the sake of discussing these concepts in this thread, because
    JV> I get confused very easily :)

Sure!  Use whatever notation makes sense for the current discussions.
You're first point is more interesting because I don't think /any/ of
these typing issues will seem natural to the vast majority of Python
hackers.  I could be wrong, and besides you know my biases already, so
I'll shut up now :).

-Barry


From tim_one@email.msn.com  Fri Mar 17 06:24:09 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 17 Mar 2000 01:24:09 -0500
Subject: [Types-sig] A late entry
In-Reply-To: <14545.26416.628396.276382@anthem.cnri.reston.va.us>
Message-ID: <000301bf8fd9$6798b700$682d153f@tim>

[Barry Warsaw, explaining the problems 'a would create for python-mode.el]
> ...
> You could also argue that Py3K shouldn't have to cater to a 20 year
> old technology like Emacs, and you'd probably be right.

P3K should cater to Python programmers, though!

    'a

simply looks like an unterminated string, regardless of whether pymode or
Python programmers are looking at it.  The second most likely bad
interpretation will come from Lispers, viewing it as a symbol.  So it's
simply poor notation for Python.  'a' would work, though!  Haskell uses
unadorned letters for type parameters (i.e., the ML convention isn't
universal even among its relatives) -- but Haskell doesn't have inline
function declarations.

> ...
> I'd also be interested in seeing what the IDLE developers think about
> such syntax changes.

I expect that context-sensitive literal syntax is a non-starter regardless
of tool (don't forget PythonWorks, and tokenize.py, and pyclbr.py, and
kjlint, and untold mounds of homegrown stuff that also expects apostrophe to
mean string).

unary-plus-is-pretty-much-unused<wink>-ly y'rs  - tim


PS:

[John Viega]
> ...
> As far as I know, the assumption is that you won't be able to approximate
> the grammar with regexp-based matching.  You already can't do it
perfectly,
> of course.

pymode cannot because of its reliance on the Emacs parsing functions.  But
IDLE's regexp-based parsing is believed to be 100% correct(*).  Ditto
tokenize.py's.

Don't get hung up on the spelling!  As someone else wise once said, Guido is
a master of syntax, and will pick something *he* likes regardless of what we
recommend <0.9 wink>.


(*) For a value of 100 strictly less than 100 <wink>, but equal to 100 for
    the almost-inclusive subset of Python's full grammar Guido doesn't
regret:

    >>> i = 3and 4

    is mis-colorized by IDLE, and that's the way Guido wants it (in order,
of
    course, to discourage it).


From jeremy-home@cnri.reston.va.us  Fri Mar 17 16:56:42 2000
From: jeremy-home@cnri.reston.va.us (Jeremy Hylton)
Date: Fri, 17 Mar 2000 11:56:42 -0500 (EST)
Subject: [Types-sig] Re: A late entry
In-Reply-To: <200003171631.LAA23287@ns1.cnri.reston.va.us>
References: <200003171631.LAA23287@ns1.cnri.reston.va.us>
Message-ID: <14546.24663.399881.121026@walden>

>I think it would be good to allow parameter-based method overloading
>for people who use the type system.  You'd be allowed to do stuff like:
>
>class Formatter:
>  def print(self: Formatter, i : integer)->None: ...
>  def print(self: Formatter, s : string)->None:  ...
>  def print(self: Formatter, l : ['a])->None: ...

I have been uneasy about OR types, too.  I think the primary source of 
OR-ing is default arguments and various methods that implement
Pythonic method overloading.  If we allow method overloading in the
type system -- to describe multiple valid signatures of a single
method object -- we might eliminate many of the problems.

class Foo:
    decl __init__(self, arg1: int, arg2: int)
    decl __init__(self, arg1: string)
    def __init__(self, arg1=None, arg2=None):
        [...]

I think this is a little simpler than the propopsal you made.  It
merely provides a mechanism to define simple types for existing Python 
code.

The other significant source of OR types is treating None as a
distinct type, which requires an OR type anywhere that you want to
pass an object or None.  If we also eliminate that, there is little
need for OR types.

-- Jeremy Hylton <http://www.python.org/~jeremy/>


From John@list.org  Sat Mar 18 01:26:08 2000
From: John@list.org (John Viega)
Date: Fri, 17 Mar 2000 17:26:08 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <20000316175108.C14723@chronis.pobox.com>; from scott on Thu, Mar 16, 2000 at 05:51:08PM -0500
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com> <20000316140056.F3845@viega.org> <20000316175108.C14723@chronis.pobox.com>
Message-ID: <20000317172608.A12852@viega.org>

On Thu, Mar 16, 2000 at 05:51:08PM -0500, scott wrote:
> On Thu, Mar 16, 2000 at 02:00:56PM -0800, John Viega wrote:
> 
> I'm not saying anything forces an OR construct, or even that one is a
> good idea.  I'm just trying to get the implications straight.  One of
> the implications of treating the type of None as a principal type is
> that a static type checker will be able to say "hey x might be None,
> but you're assuming it's a string!"  in code like the following:
> 
> x = {'foo': 'bar'}.get('baz')
> x = x + ''
> 
> That's a good thing, and allowing the type of x to be 'None | string'
> combined with the necessity of typecasing the result is one way of
> having a static type system understand this error.

It's one way.  Most languages don't find the problem worth fixing
statically, of course.  There's another, more natural way of modeling
the problem that does allow for static checking of this sort of thing.
Basically, for each object type, you have a second type which is
identical, with the exception that it can never hold null.  In your
above example, we'd then be able to give a type warning in the above
case.  The problem there is that you have to explicitly typecase (or
typecast) every time you make a call to get() and get back a valid
result.

No languages bother here.  Most languages just do an analysis, and try
to figure out which uses might perform illegal operations on the void
value, giving errors at only those points.  That kind of analysis can
be done statically, and leverages the type system, even though it
doesn't have any obvious manifestations in the syntax.


> > I got confused here for a second... this looks too much like an actual
> > tuple use for me. :) Whatever syntax ends up getting used, can we use
> > (x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2
> > of z?  That's a very common notation.
> 
> hmm.  I don't want to get into syntax wars.  I'll use whatever syntax
> you like for this discussion.  The reason I used the above syntax is
> that it was proposed and used lots in previous discussions.  


I'm not so much concerned about the final syntax... I just got
confused seeing something that looked like a tuple, and not a type :)

> > Well, you're going to want to keep complete type information around
> > for the runtime to use anyway, so that doesn't really matter.
> 
> yikes.  I was hoping to avoid that, as it implies a major leap in
> difficulty.  It also seems like making this info available at runtime
> may imply that a static type system isn't really feasible until
> PY3000, which is a little disappointing.  We'll see, I guess :)

Why is that?  I don't see a major leap in difficulty myself...

> > There's been some work done in this area, but nothing that had
> > actually been implemented last I checked.  Why is it necessary? 
> 
> I'm just trying to understand the options, I don't have the knowledge
> to rule a whole lot of things out at this point, and I feel the need
> to understand things well enough to make those calls for myself.

Fair enough.

> >What
> > do you have against a poly catcall rule?
> 
> Nothing except that I don't know enough about it yet.  Just found a
> good description at
> http://www.eiffel.com/doc/manuals/technology/typing/paper/page.html
> 
> like I said before, I need to set aside more time to read!

Now that I'm thinking about it, another good thing to read is Meyer's
chapter on types in "Object-Oriented Software Construction" 2nd ed.

John


From John@list.org  Sat Mar 18 01:27:08 2000
From: John@list.org (John Viega)
Date: Fri, 17 Mar 2000 17:27:08 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <14545.26416.628396.276382@anthem.cnri.reston.va.us>; from bwarsaw@cnri.reston.va.us on Thu, Mar 16, 2000 at 05:58:56PM -0500
References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <14545.21452.817822.231182@anthem.cnri.reston.va.us> <20000316141231.G3845@viega.org> <14545.26416.628396.276382@anthem.cnri.reston.va.us>
Message-ID: <20000317172708.B12852@viega.org>

On Thu, Mar 16, 2000 at 05:58:56PM -0500, bwarsaw@cnri.reston.va.us wrote:
> 
> My prediction stands: it'll never get done, even if it were possible.
> Meaning, I really don't think it's worth the effort, and I can't
> imagine anybody actually spending the time to do it.

You are probably not wrong :)


John


From John@list.org  Sat Mar 18 01:55:03 2000
From: John@list.org (John Viega)
Date: Fri, 17 Mar 2000 17:55:03 -0800
Subject: [Types-sig] A late entry
In-Reply-To: <000301bf8fd9$6798b700$682d153f@tim>; from Tim Peters on Fri, Mar 17, 2000 at 01:24:09AM -0500
References: <14545.26416.628396.276382@anthem.cnri.reston.va.us> <000301bf8fd9$6798b700$682d153f@tim>
Message-ID: <20000317175503.C12852@viega.org>

On Fri, Mar 17, 2000 at 01:24:09AM -0500, Tim Peters wrote:
> [Barry Warsaw, explaining the problems 'a would create for python-mode.el]
> > ...
> > You could also argue that Py3K shouldn't have to cater to a 20 year
> > old technology like Emacs, and you'd probably be right.
> 
> P3K should cater to Python programmers, though!
> 
>     'a
> 
> simply looks like an unterminated string, regardless of whether pymode or
> Python programmers are looking at it.

The syntactic context in which types are used is more than different
enough for programmers, IMHO.  I think you'd have a much bigger
problem with "=" and "==".  My personal goal for the syntax is to
choose something natural to people with some familiarity with the
concepts coming in, avoiding something really ugly.  I think that
saying the tick isn't very obvious to anyone not familiar with ML and
similar languages is fair.  I think formatting problems are fair.  I
think <> is probably going to be a bit more familiar to people, but
look horrible in source.

Hey, how about the backtick? :) Using that would mess up peoples emacs
formatting in far fewer cases, if you tell emacs mode that ` is just a
regular old operator :) Seriously, though, it's probably not all that
likely that a syntax will be chosen where it's all things to all
people, and that's to be expected.


> The second most likely bad
> interpretation will come from Lispers, viewing it as a symbol.  So it's
> simply poor notation for Python.  'a' would work, though!  

That's almost like saying that people with a lisp background are going
to think there's a function call every time they see a left
parenthesis.  Plus, there are definitely reasons why the tick is
common between types in ML and symbols in lisp.  They both represent
abstractions away from concrete values.  Granted, there are some
significant semantic differences, though :)


> Haskell uses
> unadorned letters for type parameters (i.e., the ML convention isn't
> universal even among its relatives) -- but Haskell doesn't have inline
> function declarations.

Right... there are good reasons why Haskell doesn't need such
syntactic garbage.  You're also right that there isn't a universal
syntax.  ML's is pretty widely known as far as it goes, but the FP
community is really small.  There are all sorts of syntaxes for type
variables... I know some languages start with "*" and then keep adding
"*"'s as they need more vars in a type.

> 
> > ...
> > I'd also be interested in seeing what the IDLE developers think about
> > such syntax changes.
> 
> I expect that context-sensitive literal syntax is a non-starter regardless
> of tool (don't forget PythonWorks, and tokenize.py, and pyclbr.py, and
> kjlint, and untold mounds of homegrown stuff that also expects apostrophe to
> mean string).


Nooo, using a tick in type expressions doesn't do anything to affect
whether that particular piece of syntax is regular, context-free or
context-sensitive.  Context-sensitive is definitely wrong; the entire
type syntax I proposed is context-free, and not because of the ticks.
Replace the ' with a `, + or a ~, and nothing has changed.  The presence
of the :- segregates the type quite distinctly... the tick itself
doesn't even move the syntax into the context-free world... it's the
ability to nest types (e.g., ('x ('y 'z ('x))) ) that brings the type
language in the realm of the non-regular.

> Don't get hung up on the spelling!  As someone else wise once said, Guido is
> a master of syntax, and will pick something *he* likes regardless of what we
> recommend <0.9 wink>.

In this particular case, I'm not really attached to the syntax; I'm
just hard pressed to come up with an alternative that isn't at least
as ugly.  Point well taken, however.

John


From John@list.org  Sun Mar 19 14:30:35 2000
From: John@list.org (John Viega)
Date: Sun, 19 Mar 2000 06:30:35 -0800
Subject: [Types-sig] Re: A late entry
In-Reply-To: <14546.24663.399881.121026@walden>; from Jeremy Hylton on Fri, Mar 17, 2000 at 11:56:42AM -0500
References: <200003171631.LAA23287@ns1.cnri.reston.va.us> <14546.24663.399881.121026@walden>
Message-ID: <20000319063035.A16949@viega.org>

On Fri, Mar 17, 2000 at 11:56:42AM -0500, Jeremy Hylton wrote:
> 
> I have been uneasy about OR types, too.  I think the primary source of 
> OR-ing is default arguments and various methods that implement
> Pythonic method overloading.  If we allow method overloading in the
> type system -- to describe multiple valid signatures of a single
> method object -- we might eliminate many of the problems.
> 
> class Foo:
>     decl __init__(self, arg1: int, arg2: int)
>     decl __init__(self, arg1: string)
>     def __init__(self, arg1=None, arg2=None):
>         [...]
> 
> I think this is a little simpler than the propopsal you made.  It
> merely provides a mechanism to define simple types for existing Python 
> code.

Really?  I think it ends up being more complex all around.  First,
you're going to end up having the same problems as prototypes, mainly
what to do with argument names?  Do you ignore them if they don't
match between the declaration and definition, or treat it as an error?
Since the argument names aren't actually valuable in the declaration,
it's an bit of extra syntax... but removing it doesn't necessarily
help.

The second problem I see is that type declarations will sometimes not
accompany the method definition.  That will be slightly confusing.  Of
course, if you allow multiple definitions those definitions don't
necessarily have to be in the same place (though you could add some
ML-like syntax to force the issue).

More importantly, if you do things this way, it's much more difficult
to type check, because you can't assume the programmer wrote correct
code, and now you have to enforce potentially complex constraints.

Consider, for example:

class Foo:
  decl __init__(self, arg1 :- int, arg2 :- float)
  decl __init__(self, arg1 :- float, arg2 :- string)
  def __init__(self, arg1, arg2):
    ...

When we type check the actual definition, not only do we get complex
OR'd types (int|float, float|string), but we have to maintain
constraints between these types (if arg1 is a float then arg2 must be
a string).  Plus, it doesn't take very long to construct a program
where you end up performing more dynamic type checks than you would if
you just had to bind a call to a method at run time.  I think it's too
useful to be able to accurately type method bodies.

Getting back to that ML-ish syntax, you could try something like:

class Foo:
  def __init__: # No parens
  version __init__(self, arg1 :- int, arg2 :- float):
    ...
  version __init__(self, arg1 :- float, arg2 :- int):
    ...


At least it would be easy to type check, and keep all definitions
together, except those overloaded in a derived class.


> 
> The other significant source of OR types is treating None as a
> distinct type, which requires an OR type anywhere that you want to
> pass an object or None.  If we also eliminate that, there is little
> need for OR types.

Hopefully I've already argued effectively against this solution in
other messages...

John