[Types-sig] RFC 0.1

Greg Stein gstein@lyra.org
Tue, 14 Dec 1999 03:08:59 -0800 (PST)


On Mon, 13 Dec 1999, Paul Prescod wrote:
> I did evaluate your proposal but it seemed to me that it was solving a
> slightly different problem. I think as we compare them we'll find that
> your ideas were more oriented toward runtime safety checking.

True, but I might posit that (due to Python's dynamic nature) you really
aren't going to come up with a good compile-time system. That leaves
runtime.

> Greg Stein wrote:
> > 
> > > > #2. The system must allow authors to make assertions about the sorts
> > > > of values that may be bound to names. These are called binding
> > > > assertions. They should behave as if an assertion statement was
> > > > inserted after every assignment to that name program-wide.
> > 
> > In our writeup, we posit that it is better (and more Pythonic) to bind the
> > assertions to expressions, rather than names. This came about when we
> > looked at how to supply assertions for things like:
> > 
> >   x.y = value
> >   x[i] = value
> >   x[i:j] = value
> 
> I wouldn't supply assertions for assignments at all. You supply
> assertions for the names x, y, i, and j.

I know you wouldn't. I was offering a different tack (and one that seemed
to work better).

In Python, names have no semantics other than an identifier, a scope, and
that they are a reference. We thought it would be nice to retain the
notion that names are just names -- it is the objects and what you're
doing with them that is important.

> > Certainly, function objects would have type information associated with
> > them, but I believe that is different than associating a type with the
> > function's name.
> 
> But if a function takes as its first argument an int, in what sense is
> that type associated with an "expression"? It is associated with a name,
> whatever the name of the first argument is. Plus consider this:

I think I wasn't clear enough here. In the following statement:

  a = b

We suggested that type checking is defined and applied to the value (b),
rather than associating a type with "a" and performing an assertion at
assignment time. The concept of "this variable name can only contain
values of <this> type" is a standard, classical approach. We didn't think
it applied as well to Python (for a number of reasons). If you're doing
type inferencing, then you are actually tracking values -- the types
associated with a name are very artificial/unnecessary during type
inferencing. For example:

  a = [1, 2]
  foo(a)
  a = {1: 2}
  bar(a)
  a = 1.2
  baz(a)

The above code is quite legal in Python (and no, I don't want to hear
arguments that it shouldn't be :-). With a type system that is associated
with expressions/values rather than names, then we can do proper type
inferencing, checking, etc on the above code.

The only thing in our outline that has associated type information is a
function object (note: not a function name). Reflection on the function
can get the information for you (obviously, only useful for runtime
tools; compilers would be using syntactic markers only).

> type-safe
> String
> def foo():
> 	return abc()
> 
> How can I, at compile time, statically know the type of the value
> currently contained in the name abc if I don't restrict it in advance
> like this:
> 
> String
> def abc(): return "abc"

You would. I never said otherwise :-)

But I see it as data (type) flow: "abc happens to refer to an object,
which is typed as a function returning a string", rather than saying "abc
is a function returning a string."

Just as objects have types ("is-a"), I think a function object should
expand a bit and record the types of its params and return value(s).

> Rebinding is fine, as long as it doesn't invalidate the type
> declaration:
> 
> abc = lambda: "def"

So you say :-)  I say rebind all you want. Base your assertions and
type-checks on what it has at whatever lexical point in your program.

For the case of:

  if condition:
      x = 1
  else:
      x = "abc"

I would say that the type of "x" is a set, rather than a particular type.
If you're going to do type-checking/assertions, then any uses of "x"
better be able to accept all types in the set.

> > We also proposed extending isinstance() to allow a callable for the third
> > argument. This allows for arbitrarily complex type checking (e.g. the
> > "list of integers" problem).
> 
> I liked that idea but really didn't see how to port it to a compile time
> static type checker. I'm going out of my way to avoid running arbitrary
> Python code. Static type checking shouldn't be a security hazard.

I believe that Python is too rich in data types and composition of types
to be able to add *syntax* for all type declarations. I think you better
stop and realize that before you get in too deep :-)

In your RFC 0.1, you punted on the complex/composited data types issue too
keep the solution tractable. I posit that you will *never* solve the
problem of coming up with sufficient syntactical expression; therefore,
you will always have to resort to a procedural component in your type
system *if* you want full coverage.

>...
> > If type assertions are bound to expressions, rather than names, a data
> > flow analysis will show the types at any point. This could (theoretically)
> > avoid many "declarations".
> 
> Names get their values from expressions so the data flow analysis is the
> same.

Partially true, but as I mentioned above: names are just points in your
data flow. They are a side-effect. A name "recieves" a type from the data
-- it does not "drive" the data flow. I think it is clearer to just avoid
the attaching of a type to a name and to just look at the data.

Note that one benefit of associating types with names, is that you can
shortcut the data flow analysis (so the analysis is not necessarily the
same). But: you cannot have a name refer to different types of objects
(which I don't like; it reduces some of Python's polymorphic and dynamic
behavior (interfaces solve the polymorphism stuff in a typed world)).

> If you have to type-check the statement "return a" then you need to be
> able to know the type of both the variable and the expression.

[ by expression, I presume you mean the object referenced by "a". if you
  mean the expression "a", well yah... but that's a degenerate case which
  doesn't server as a good example of what you're trying to say. ]

In the above case, you need to know the type of the variable *OR* the
expression (the referenced object). If you have the type of the variable,
then you simply assert that type rather than using data flow to know what 
is referenced.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/