[Types-sig] bunch o' stuff (was: minimal or major change?)

Greg Stein gstein@lyra.org
Thu, 16 Dec 1999 15:05:58 -0800 (PST)


On Thu, 16 Dec 1999, Martijn Faassen wrote:
>...
> [I'm disagreeing with the 'isn't that big of a change' thesis, Greg
> defends fairly
> well that it is, but I still disagree with him. I don't think our
> disagreeing will matter much in the future, though, so let's forget
> about it..

Not a problem. "Agree to disagree" is quite civilized and proper :-)

>...
> > > * A whole new operator (which you can't overload..or can you?), which
> > > does something quite unusual (most programmers associate types with
> > > names, not with expressions). The operation also doesn't actually return
> > > much that's useful to the program, so the semantics are weird too.
> > 
> > No, you cannot overload the operator. That would be a Bad Thing, I think.
> > That would throw the whole type system into the garbage :-).
> 
> Okay, in that sense the operator would be special, as generally
> operators
> in Python can be overloaded (directly or indirectly). I'd agree you
> shouldn't be able to overload this one, though.

Well, I hope to not consider it "special". In my mind, it is just another
operator. It has some semantics the compiler can take advantage of, sure,
but it isn't like a pragma or some other meta-level thing.

However, its semantics don't lend well to overloading. *shrug* Assuming we
end up proposing this operator to Guido for inclusion as part of the new
type system, then he can certainly make a call on whether it should be
possible to overload it.

> > The operator is not unusual: it is an inline type assertion. It is not a
> > "new-fangled way to declare the type of something."
> 
> But it's quite unusual to the programmer coming from most other
> languages, still. That doesn't mean it's bad, but Python isn't an
> experimental language, so this could be an objection to the operator
> approach.

Perl has the ~= operator, which has unusual semantics for a programmer 
coming from Python. Python's slice operator is not available to a C or C++
programmer, but people don't complain about that.

Point is: each language has its own set of operators to solve problems
within that language's domain. I see this operator as a pretty neat and
clean way to resolve Python's (current) lack of type declarations.

And I disagree with the notion "Python isn't an experimental language." It
is one of the few to natively support complex types, sophisticated
slicing, builtin dictionary types, and keyword arguments.

>...
> > > * Interfaces with a new 'decl' statement. [If you punt on this you'll
> > > have to the innocent Python programmer he can't use the static type
> > > system with instances? or will we this be inferenced?]
> > 
> > Yes, I'd prefer to punt this for a while, as it is a much larger can of
> > worms. It is another huge discussion piece. In the current discussion, I
> > believe that we can factor out the interface issue quite easily -- we
> > can do a lot of work now, and when interfaces arrive, they will slide
> > right in without interfering with the V1 work. In other words, I believe
> > there is very little coupling between the proposal as I've outline, and
> > the next set of type system extensions (via interfaces).
> 
> Hm, I'm still having some difficulty with this; as I understand it your
> proposal would initially only work with functions (not methods) which
> only use built-in types (not class instances). Am I right, or perhaps
> I'm missing something..

Methods are actually function objects. When I've referred to functions,
I'm talking about functions and methods. In other words:

class Foo:
  def bar(x: String, y:String) -> String:
    pass

In the above code, the bar() method has a type signature, which can be
type-checked.

Since writing the quoted text, I've read the interface proposal and
thought more on the "decl" statement. I am now in favor of including
"decl" in V1, thus providing types for all portions of an interface
(attributes and method).

>...
> > > Adding anything like static type checking to Python entails fairly major
> > > changes to the language, I'd think. Not that we shouldn't aim at keeping
> > > those transparant and mostly compatible with Python as it is now, but
> > > what we'll add will still be major.
> > 
> > Sure.
> 
> You say 'sure' to me saying it'll still be major? :) Oh, wait, I wasn't
> arguing about that anymore!

I'm not sure what I was referring to. Sorry about that. I think I meant,
"yes, we should aim at keeping things transparent and compatible." At
least, that's what I mean now when I re-read and re-comment on your text
:-)

>...
> > > > > The 'simplicity' part comes in because you don't need *any* type
> > > > > inferencing. Conceptually it's quite simple; all names need a type.
> > > >
> > > > 1) There is *no* way that I'm going to give every name a type. I may as
> > > >    well switch to Java, C, or C++ (per Guido's advice in another email :-)
> > >
> > > Sure, but we're looking at *starting* the process. Perhaps we can do
> > > away with specifying the type of each local variable very quickly by
> > > using type inferencing, but at least we'll have a working
> > > implementation!
> > 
> > I don't want to start there. I don't believe we need to start there. And
> > my point (2) below blows away your premise of simplicity. Since you still
> > need inferencing, the requirement to declare every name is not going to
> > help, so you may as well relax that requirement.
> 
> But you'd only need expression inferencing, which I was ('intuitively'
> :) assuming is easier than the larger scale thing.

Yes, expression-level inferencing is easier, as you don't have to worry
about code like this:

a = 1
while 1:
  func_which_takes_int(a)
  a = "foo"

The above code should raise a type-check error. Tim referred to the above
problem when he talked about "reaching a stable state," although it
probably wasn't obvious to most readers :-)

If names have types, then the a="foo" line would raise an error. In a
purely inferenced world, the inferencer (eventually) figures out that <a>
can have one of two types at the time of the function call. It then raises
an error saying "func_which_takes_int expect an Int, but a may be a
String."

>...
> > > I'm not saying this is a good situation, it's just a way to get off the
> > > ground without having to deal with quite a few complexities such as
> > > inferencing (outside expressions), interaction with modules that don't
> > > have type annotations, and so on. I'm *not* advocating this as the end
> > > point, but I am advocating this as an intermediate point where it's
> > > actually functional.
> > 
> > IMO, it is better to assume "PyObject" when you don't have type
> > information, rather than throw an error. Detecting the lack of type info
> > is the same in both cases, and the resolution of the lack is easy in both
> > mehtods: throw an error, or substitute "PyObject". I prefer the latter so
> > that I don't have to update every module I even get close to.
> 
> I still don't understand how making it a PyObject will help here. Would
> this mean a run-time check would need to be inserted whenever PyObject
> occurs in a function with type annotations? In my approach this would be
> part of the Python/Static Python interface work. How does it fit in for
> you?

The PyObject approach means that you don't throw an error. There are no
runtime checks or compile time checks. They are simply unavailable since
you have no type information.

Using PyObject will help because it means you aren't raising errors simply
because some module has not added type declarations. Instead, the compiler
just uses the "unknown" (PyObject) type and keeps going. Of course, that
may cause type errors later, but that is resolvable with the type-assert
operator (which inserts a run-time check, and tells the compiler what type
you're expecting it to be).

>...
> > > Yes, but now you're building a static type checker *and* a Python
> > > compiler inserting run time checks into bytecodes. This is two things.
> > > This is more work, and more interacting systems, before you get *any*
> > > payoff. My sequence would be:
> > 
> > Who says *both* must be implemented in V0.1? If the compiler can't figure
> > it out, then it just issues a warning and continues. Some intrepid
> > programmer comes along and tweaks the AST to insert a runtime check. Done.
> > The project is easily phased to give you a working system very quickly.
> > 
> > Heck, it may even be easier for the compiler to insert runtime checks in
> > V0.1. Static checking might come later. Or maybe an external tool does the
> > checking at first; later to be built into the compiler.
> 
> That's true; the other approach would start with adding run-time checks
> and proceed to a static checker later.

Yes, that's what I said :-)

First, we add the new typedecl syntax. Then, if the compiler inserts
runtime checks for function arguments and as a result of the type-assert
operator, then we have a good first pass. Next comes an external tool to
consume type information and perform type inferencing and checking.
Finally, we decide on integrating the external tool into the compiler
proper.

>...
> So that's where I'm coming from. It's important for our proposal to
> actually come up with a workable development plan, because adding type
> checking to Python is rather involved. So I've been pushing one course
> of implementation towards a testable/hackable system that seems to give
> us the minimal amount of development complexities. I haven't seen clear
> development paths from others yet; most proposals seem to involve both
> run-time and compile-time developments at the same time.

I haven't seen any, let alone clear, discussions from others about
development paths :-)

But I don't think anybody is going to advocate a system that will take a
while to bring up, so I think we're all in agreement here.

>...
> > > This'd be only implementable with run-time assertions, I think, unless
> > > you do inferencing and know what the type the object is after all. So
> > > that's why I put the limitation there. Don't allow unknown objects
> > > entering a statically typed function before you have the basic static
> > > type system going. After that you can work on type inference or cleaner
> > > interfaces with regular Python.
> > 
> > Why not allow unknown objects? Just call it a PyObject and be done with
> > it.
> 
> Hm, I suppose I'm looking at it from the OPT point of view; I'd like to
> see a compiler that exploits the type information. If you have PyObjects
> this seems to get more difficult; could be solved if you had an
> interpreter waiting in the sidelines that would handle stuff like this
> that can't be compiled.

The compiler can exploit type information, sure. But we're talking about
the case where type information is not available. Rather than just
failing, the compiler just doesn't optimize. Using the type-assert
operator, you can get the compiler cranking up again (of course, you could
also go and add type annotations to the code being called).

> > Note that the type-assert operator has several purposes:
> > 
> > * a run-time assertion (and possibly: unless -O is used)
> > * signal to the compiler that the expression value will have that type
> >   (because otherwise, an exception would hav been raised)
> > * provides a mechanism to type-check: if the compiler discovers (thru
> >   inferencing) that the value has a different type than the right-hand
> >   side, then it can flag an error.
> > 
> > The limitation you propose would actually slow things down. People would
> > not be able to use the type system until a lot of modules were
> > type-annotated.
> 
> I think I'm starting to see where you're coming from now, with the !
> operator. It allows you to say 'from this point on, this value is an
> int, otherwise the operator would've raised an exception'. The
> inferencer and checker can exploit this.

Exactly! The compiler can also use it to perform various optimizations,
since it now knows the (guaranteed) type.

> The point where I am coming
> from is however that you lose compile-time checkability as soon as you
> use any function that inserts PyObjects into the mix. I'm afraid that
> even with the operator, you wouldn't be able to check most of the code,
> if PyObjects are freely allowed. Perhaps I'm wrong, but I'd like to see
> some more debate about this.

Yes, you lose it, but that doesn't mean you throw the baby out with the
bath water. The compiler just degrades gracefully in the presence of a
PyObject. With the type-assert operator, you effectively convert that
PyObject into a known type which the compiler can then use in later checks
and optimizations.

> > > But perhaps I'm mistaken and local variables don't need type
> > > descriptions, as it's easy to do type inferencing from the types of the
> > > function arguments and what the function returns,
> > 
> > That is my (alas: unproven) belief.
> 
> How do we set about to prove it?

I don't need a proof :-). I think we *can* use inferencing to avoid decls
for local variables. In fact, I am positive of it, and instead would like
to hear a counter-proof.

> Here I'll come with my approach again;
> if you have a type checker that can handle a fully annotated function
> (all names used in the function have type annotations), then you have a
> platform you can build on to develop a type checker. Then you can figure
> out what does need type annotations and what doesn't. You simply try to
> build code that adds type annotations itself, based on inferences. You
> can spew out warnings: "full type inferencing not possible, cannot
> figure out type of 'foo'". The programmer can then go add type info for
> 'foo'. If all types are known one way (specified) or the other
> (inferred), a compiler can start to do heavy duty optimization on that
> code.

I do not believe that developing a type checker for fully-annotated
functions is going to help in any way towards building an inferencer. In
other words, we just build the inferencer.

However, I do see that a compiler that knows all types is a good first
step. Using those types, it can do various things (e.g. type checks on
func args, various optimizations). Where it gets the type information is
the point of discussion here :-)

I'd rather just start with inferencing rather than modifying the syntax to
support typing of locals, only to pull that syntax change out later.

Note: your proposal of __types__ would be useful during development of the
compiler (presuming that occurs before the inferencer is available).
__types__ requires no syntax changes, so it can give the compiler the info
right away. Later, we just stop looking for __types__ and use the
inferencer.

>...
> [snip]
> > > I'd like to see some actual
> > > examples of how this'd work first, though. For instance:
> > >
> > > def brilliant() ! IntType:
> > >     a = []
> > >     a.append(1)
> > >     a.append("foo")
> > >     return a[0]
> > >
> > > What's the inferred type of 'a' now? A list with heterogenous contents,
> > > that's about all you can say, and how hard is it for a type inferencer
> > > to deduce even that?
> > 
> > It would be very difficult for an inferencer. It would have to understand
> > the semantics of ListType.append(). Specifically, that the type of the
> > argument is added to the set of possible types for the List elements.
> > 
> > Certainly: a good inferencer would understand all the builtin types and
> > their methods' semantics.
> > 
> > > But for optimization purposes, at least, but it
> > > could also help with error checking, if 'a' was a list of IntType, or
> > > StringType, or something like that?
> > 
> > It would still need to understand the semantics to do this kind of
> > checking. In my no-variable-declaration world, the type error would be
> > raised at the return statement. a[0] would have the type set: (IntType,
> > StringType). The compiler would flag an error stating "return value may be
> > a StringType or an IntType, but it must only be an IntType".
> 
> Right, I think this would be the right behavior. But it becomes a lot
> easier to get a working implementation if you get to specify the type of
> 'a'. If you say a is a list of StringType, it's then relatively easy for
> a compile time checker to notice that you can't add an integer to it.

Well, kind of. The checker would sitll have to understand that a.append()
is going to insert that value into the list, so that appending an Int
would generate a type conflict.

Re: working implementation faster: this presumes that the compiler will
use the type declarations before the inferencer is available.

> And possibly it also becomes clearer for the programmer; I had to think
> to figure out why your compiler would complain about a[0]. I had to play
> type inferencer myself. I don't have to think as much if I get to
> specify what list 'a' may contain; obviously if something else it put
> into it, there should be an error.

The programmer never has to think about type inferencing. That only exists
to create type-check warnings/errors. The programmer believes that he has
a list of integers and codes that way. The inferencer then comes along and
tells him that he goofed up.

Declaring <a> up front simply moves the error from the return statement to
the point where the wrong type was inserted into <a>. It is arguable that
this is preferable.

> > > It seems tough for the type
> > > inferencer to be able to figure out that this is so, but perhaps I'm
> > > overestimating the difficulty.
> > 
> > Yes it would be tough -- you aren't overestimating :-)
> 
> What would your path towards successful implementation be, then?

* add the syntax changes (decl, def changes, and !)
* change the compiler to use the new syntax to insert runtime checks
* develop external tool to do type checking
* possibly integrate the tool into the compiler

Note that the external tool will start with rudimentary type inference and
analysis. It will then grow in complexity as more capability is added. For
example, initially, it might only know "a" + 1 is a type error. Later, it
would be able to do some simple inference based on data flow. Later still,
it would recognize problems like the "while" example I listed above.

Also note that I'm not sure we ever put type-checking into the core
interpreter. If it isn't going to alter the compilation output, then why
put it in? In other words: somebody ought to come up with a list of things
they expect the compiler to alter in the *bytecodes* based on the type
information (Python doesn't really have type-specific bytecodes (yet)).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/