[Types-sig] minimal or major change? (was: RFC 0.1)

Martijn Faassen m.faassen@vet.uu.nl
Thu, 16 Dec 1999 14:31:46 +0100


Greg Stein wrote:
> 
> On Wed, 15 Dec 1999, Martijn Faassen wrote:
> 
> ... me: stating the "GFS proposal" isn't that major of a change ...

[I'm disagreeing with the 'isn't that big of a change' thesis, Greg
defends fairly
well that it is, but I still disagree with him. I don't think our
disagreeing will matter much in the future, though, so let's forget
about it..
I'll answer some points he raised in the following, but not to defend
my point of view :)]

[snip]
> > * A whole new operator (which you can't overload..or can you?), which
> > does something quite unusual (most programmers associate types with
> > names, not with expressions). The operation also doesn't actually return
> > much that's useful to the program, so the semantics are weird too.
> 
> No, you cannot overload the operator. That would be a Bad Thing, I think.
> That would throw the whole type system into the garbage :-).

Okay, in that sense the operator would be special, as generally
operators
in Python can be overloaded (directly or indirectly). I'd agree you
shouldn't be able to overload this one, though.

> The operator is not unusual: it is an inline type assertion. It is not a
> "new-fangled way to declare the type of something."

But it's quite unusual to the programmer coming from most other
languages, still. That doesn't mean it's bad, but Python isn't an
experimental language, so this could be an objection to the operator
approach.

> It is simply a new
> operation. The compiler happens to be able to create associations from it,
> but that does *not* alter the basic semantic of the operation.
> 
> Given:
> 
>    x = y or z
> 
> In the above statement, it returns "y" if it is "true". In the statement:
> 
>    x = y ! z
> 
> It returns "y" if it has "z" type; otherwise, throws an exception. The
> semantics aren't all the difficult or unusual.

Okay, that isn't that unusual as other operator operations can throw
exceptions under some circumstances as well. Well defended. :)

[snip]
> > * Interfaces with a new 'decl' statement. [If you punt on this you'll
> > have to the innocent Python programmer he can't use the static type
> > system with instances? or will we this be inferenced?]
> 
> Yes, I'd prefer to punt this for a while, as it is a much larger can of
> worms. It is another huge discussion piece. In the current discussion, I
> believe that we can factor out the interface issue quite easily -- we
> can do a lot of work now, and when interfaces arrive, they will slide
> right in without interfering with the V1 work. In other words, I believe
> there is very little coupling between the proposal as I've outline, and
> the next set of type system extensions (via interfaces).

Hm, I'm still having some difficulty with this; as I understand it your
proposal would initially only work with functions (not methods) which
only use built-in types (not class instances). Am I right, or perhaps
I'm missing something..


[snip]

> > Adding anything like static type checking to Python entails fairly major
> > changes to the language, I'd think. Not that we shouldn't aim at keeping
> > those transparant and mostly compatible with Python as it is now, but
> > what we'll add will still be major.
> 
> Sure.

You say 'sure' to me saying it'll still be major? :) Oh, wait, I wasn't
arguing about that anymore!

> I think we're just viewing it a bit differently. To me, something
> like the metaclass stuff was a big change: it is capable of altering the
> very semantics of class construction. Adding package support was the same
> -- Python moved from a flat import space to an entirely new semantic for
> importing and application packaging.

Both happened before I was involved with Python, and I still don't know
much about metaclasses, so I can't comment on this one.

> > > > The 'simplicity' part comes in because you don't need *any* type
> > > > inferencing. Conceptually it's quite simple; all names need a type.
> > >
> > > 1) There is *no* way that I'm going to give every name a type. I may as
> > >    well switch to Java, C, or C++ (per Guido's advice in another email :-)
> >
> > Sure, but we're looking at *starting* the process. Perhaps we can do
> > away with specifying the type of each local variable very quickly by
> > using type inferencing, but at least we'll have a working
> > implementation!
> 
> I don't want to start there. I don't believe we need to start there. And
> my point (2) below blows away your premise of simplicity. Since you still
> need inferencing, the requirement to declare every name is not going to
> help, so you may as well relax that requirement.

But you'd only need expression inferencing, which I was ('intuitively'
:) assuming is easier than the larger scale thing.

[snip]
> > I'm not saying this is a good situation, it's just a way to get off the
> > ground without having to deal with quite a few complexities such as
> > inferencing (outside expressions), interaction with modules that don't
> > have type annotations, and so on. I'm *not* advocating this as the end
> > point, but I am advocating this as an intermediate point where it's
> > actually functional.
> 
> IMO, it is better to assume "PyObject" when you don't have type
> information, rather than throw an error. Detecting the lack of type info
> is the same in both cases, and the resolution of the lack is easy in both
> mehtods: throw an error, or substitute "PyObject". I prefer the latter so
> that I don't have to update every module I even get close to.

I still don't understand how making it a PyObject will help here. Would
this mean a run-time check would need to be inserted whenever PyObject
occurs in a function with type annotations? In my approach this would be
part of the Python/Static Python interface work. How does it fit in for
you?

[snip]
> > Yes, but now you're building a static type checker *and* a Python
> > compiler inserting run time checks into bytecodes. This is two things.
> > This is more work, and more interacting systems, before you get *any*
> > payoff. My sequence would be:
> 
> Who says *both* must be implemented in V0.1? If the compiler can't figure
> it out, then it just issues a warning and continues. Some intrepid
> programmer comes along and tweaks the AST to insert a runtime check. Done.
> The project is easily phased to give you a working system very quickly.
> 
> Heck, it may even be easier for the compiler to insert runtime checks in
> V0.1. Static checking might come later. Or maybe an external tool does the
> checking at first; later to be built into the compiler.

That's true; the other approach would start with adding run-time checks
and proceed to a static checker later.

> ... proposed implementation order ...
> > If you don't separate out your development path like this you end up
> > having to do it all at once, which is harder and less easy to test.
> 
> Of course. Nobody is suggesting a "do it all at once" course of
> implementation.

So that's where I'm coming from. It's important for our proposal to
actually come up with a workable development plan, because adding type
checking to Python is rather involved. So I've been pushing one course
of implementation towards a testable/hackable system that seems to give
us the minimal amount of development complexities. I haven't seen clear
development paths from others yet; most proposals seem to involve both
run-time and compile-time developments at the same time.

So I'm interested to see other development proposals; possibly there's a
simpler approach or equally complex approach with more payoff, that I'm
missing.


> > [Paul]
> > > > > I see no reason for that limitation. The result of a call to a
> > > > > non-static function is a Pyobject. You cast it in your client code to
> > > > > get type safety. Just like the shift from K&R C to ANSI C. Functions
> > >
> > > Bunk! It is *not* a cast. You cannot cast in Python. It is a type
> > > assertion. An object is an object -- you cannot cast it to something else.
> > > Forget function call syntax and casting syntax -- they don't work
> > > grammatically, and that is the wrong semantic (if you're using that format
> > > to create some semantic equivalent to a cast).
> >
> > This'd be only implementable with run-time assertions, I think, unless
> > you do inferencing and know what the type the object is after all. So
> > that's why I put the limitation there. Don't allow unknown objects
> > entering a statically typed function before you have the basic static
> > type system going. After that you can work on type inference or cleaner
> > interfaces with regular Python.
> 
> Why not allow unknown objects? Just call it a PyObject and be done with
> it.

Hm, I suppose I'm looking at it from the OPT point of view; I'd like to
see a compiler that exploits the type information. If you have PyObjects
this seems to get more difficult; could be solved if you had an
interpreter waiting in the sidelines that would handle stuff like this
that can't be compiled.

> Note that the type-assert operator has several purposes:
> 
> * a run-time assertion (and possibly: unless -O is used)
> * signal to the compiler that the expression value will have that type
>   (because otherwise, an exception would hav been raised)
> * provides a mechanism to type-check: if the compiler discovers (thru
>   inferencing) that the value has a different type than the right-hand
>   side, then it can flag an error.
> 
> The limitation you propose would actually slow things down. People would
> not be able to use the type system until a lot of modules were
> type-annotated.

I think I'm starting to see where you're coming from now, with the !
operator. It allows you to say 'from this point on, this value is an
int, otherwise the operator would've raised an exception'. The
inferencer and checker can exploit this. The point where I am coming
from is however that you lose compile-time checkability as soon as you
use any function that inserts PyObjects into the mix. I'm afraid that
even with the operator, you wouldn't be able to check most of the code,
if PyObjects are freely allowed. Perhaps I'm wrong, but I'd like to see
some more debate about this.

> > But perhaps I'm mistaken and local variables don't need type
> > descriptions, as it's easy to do type inferencing from the types of the
> > function arguments and what the function returns,
> 
> That is my (alas: unproven) belief.

How do we set about to prove it? Here I'll come with my approach again;
if you have a type checker that can handle a fully annotated function
(all names used in the function have type annotations), then you have a
platform you can build on to develop a type checker. Then you can figure
out what does need type annotations and what doesn't. You simply try to
build code that adds type annotations itself, based on inferences. You
can spew out warnings: "full type inferencing not possible, cannot
figure out type of 'foo'". The programmer can then go add type info for
'foo'. If all types are known one way (specified) or the other
(inferred), a compiler can start to do heavy duty optimization on that
code.
 
[snip]
> > I'd like to see some actual
> > examples of how this'd work first, though. For instance:
> >
> > def brilliant() ! IntType:
> >     a = []
> >     a.append(1)
> >     a.append("foo")
> >     return a[0]
> >
> > What's the inferred type of 'a' now? A list with heterogenous contents,
> > that's about all you can say, and how hard is it for a type inferencer
> > to deduce even that?
> 
> It would be very difficult for an inferencer. It would have to understand
> the semantics of ListType.append(). Specifically, that the type of the
> argument is added to the set of possible types for the List elements.
> 
> Certainly: a good inferencer would understand all the builtin types and
> their methods' semantics.
> 
> > But for optimization purposes, at least, but it
> > could also help with error checking, if 'a' was a list of IntType, or
> > StringType, or something like that?
> 
> It would still need to understand the semantics to do this kind of
> checking. In my no-variable-declaration world, the type error would be
> raised at the return statement. a[0] would have the type set: (IntType,
> StringType). The compiler would flag an error stating "return value may be
> a StringType or an IntType, but it must only be an IntType".

Right, I think this would be the right behavior. But it becomes a lot
easier to get a working implementation if you get to specify the type of
'a'. If you say a is a list of StringType, it's then relatively easy for
a compile time checker to notice that you can't add an integer to it.

And possibly it also becomes clearer for the programmer; I had to think
to figure out why your compiler would complain about a[0]. I had to play
type inferencer myself. I don't have to think as much if I get to
specify what list 'a' may contain; obviously if something else it put
into it, there should be an error.
 
> > It seems tough for the type
> > inferencer to be able to figure out that this is so, but perhaps I'm
> > overestimating the difficulty.
> 
> Yes it would be tough -- you aren't overestimating :-)

What would your path towards successful implementation be, then?

Regards,

Martijn