[Types-sig] how to do a typecase? (was: Static typing: Towards closure?)

Mon, 24 Jan 2000 08:53:11 +1100

Tim Peters wrote:
> 
> [John Skaller]
> >       My assumption is that we want optional type checking
> > and higher order functions.
> 
> I think everyone agrees on that, yes.  Although for "optional type checking"
> I am *not* reading "comprehensive type inference in the absence of explicit
> declarations", but "optional explicit declarations and no guarantees of type
> safety in their absence".

	Agreed. The way I saw it was that type declarations could be added
as desired, to improve diagnostics and optimisation opportunities.
This would mean that by default, no assurance of type correctness
is obtained, at least until run time (and maybe not even then).

	OTOH, a separate type checking tool could report IF in fact
the code contains enough information to provide an assurance
of type correctness (and maximal optimisation opportunities) allowing
the programmer to add extra declarations if desired.

> > In general, such a system is likely to be undecidable.
> 
> I'd amend that to add "without extreme care".  Strongly typed languages like
> Haskell and Miranda support both sophisticated polymorphism and higher-order
> functions, without requiring declarations, yet have provably sound &
> effective type inference (in the technical senses of those words:  sound ==
> "correct", effective == "always terminates").

	I can't say about Haskell or Miranda .. but Ocaml's type checker
can go into an infinite loop, and it is NOT expressive enough to handle
some quite common constructions. Perhaps this is because it supports
imperative programming, separate compilation, higher order functors,
constraints, and object orientation, whereas Haskell at least is purely
functional.

> This wasn't easy to achieve, though, and a mountain of work went into
> refining the base Hindley-Milner type system (which was fully developed
> *before* e.g. Haskell built on top of it).  Some subtle technical
> restrictions were required.

	Yeah, I can imagine.

> > I also suspect that the proposed system, as it stands,
> > is undecidable.
> 
> For most plausible ways of filling in the holes, I believe that's correct,
> except that ...
> 
> Guido is (maybe too <wink>) fond of saying that Python is "a 95% language",
> and I expect that will be a Guiding Principle here.  That is, the type
> system will be fudged in ways such that it remains tractable, by some
> combination of compromise, restriction, and plain giving up. 

	I think that this is my point. That is: I think this
is reasonable (it is what ocaml does). But people seem to think
that there will be a certain guarrantee that a module can be strictly
type checked.

	On the other hand, I think that it makes sense to require
diagnostics where a (minimal) reference implementation produces them,
and require acceptance where a (minimal) reference implementation
can prove correctness, and require nothing at all in the other cases.

	This means, in particular, that some checkers will accept more
programs than others -- including the reference implementation --
and some will reject more than others -- including the reference 
implementation -- and the only time you can say the checker has a bug
is if a case which the reference implementation accepts is not accepted,
and the case which the reference implementation rejects is not rejected,
it is never a bug to fail to accept a correct program if the reference
implementation doesn't accept it, nor to fail to reject a wrong 
program if the reference implementation doesn't reject it.

	This leaves plenty of scope for improving checkers,
and plenty of scope for fully optimised programs to core dump
if they optimise on the _assumption_ that the program obeys
the type assertions implied by declarations.

	It also means highly optimised programs can be 
produced without risking an accusation that the optimising
compiler is non-compliant.

> For example,
> it seems already to have been agreed that the type system will not be able
> to express the type of Python's "map" function (although there's hope that
> the type checker will "know it anyway", as a special case).

	Yes, it is the same in C and C++ and most other languages:
the compiler not only 'knows' the type of built-in library functions,
it ALSO 'knows' the semantics, and can generate highly optimised
code in these cases by utilising this semantic information.

> [some great examples of difficulties, which I didn't understand
>  at first because people have been writing examples such that
>  "|" binds tighter than "->"; so e.g. I first read John's
> 
>     Tuple<X> -> Tuple<Y> | X -> Y
> 
> as
> 
>     Tuple<X> -> (Tuple<Y> | X) -> Y
> 
> instead of the intended (from context)
> 
>     (Tuple<X> -> Tuple<Y>) | (X -> Y)
> ]

	Ugg, sorry, I was sloppy here!

> >       The program is CORRECT. I doubt that the checker
> > can be required to compute that: this requires calculating
> > the most general unifier, and for the rich type schema
> > being described, it isn't decidable how to do that AFAIK.
> 
> Even Hindley-Milner gives up at times (although they don't phrase it in such
> raw terms), in the sense of forbidding programs that may nevertheless be
> provably correct by going outside the H-M approach (the Haskell "pitfall"
> 12.1 (restriction to "let-bound polymorphism") is a prime example of this).

	Ah, but the point is that at least it either gives up
noisily, or terminates and assures one of correctness.
This is not easy to achieve.

> >       Just to be clear what I believe: if the type system
> > supports higher order functions, it will be undecidable.
> > I am NOT sure of this (but it sounds right).
> 
> As above, HOFs alone aren't enough to break the camel's back, but they sure
> do put a load it.  Alas, I don't know of a simple characterization of what
> is (in aggregate) enough to break it, and the H-M restrictions are so
> technical in nature as to suggest a simple characterization doesn't exist.
> But I'm no expert in this stuff either.

	For Python, I think we can guarrantee that _straight line_ code
is always checked, provided all the symbols are typed. In other words,
if we know the types of identifiers and functions, then the correctness
of all expressions and assignments can be checked.

	This assurance can be extended to loops, but we have to give
up when we reach a try/except block.

	Certain builin functions cannot be typed properly, though,
such as things that call map, exec, eval, etc.

	In my opinion, this will be enough to gain _massive_
optimisation opportunities: it is mainly small inner loops that need
to be optimised anyhow: many of these will be quite simple, and
simply typed.

	I also expect considerable benefit in terms of both
diagnostics and possibly readability for general use -- but
without a _guarrantee_ of type correctness which some other people
seem to be striving for. It is a good aim, but I think it is
unrealisable.

	PS: I really don't like the 'decl' keyword, at least for
code (as opposed to separate interface files). Nor do I like
the 'type' mini-language idea much. The way I see it, this is a flaw
in conventional type systems. A more 'Pythonic' technique would
KEEP the idea that types were run time objects (as classes are),
but still allow significant optimisation and early diagnosis IF
enough information were provided and the usage were simple enough.

	I STILL think that this can best be achieved by conservative
syntax extensions like Greg Stein's type assertion operator, plus
some restrictions on required behaviour. This isn't enough for 
best quality checking -- but otherwise the static typing language
is likely to be MUCH more complex than Python itself.

	OTOH, I agree with Guido that parameterised types are useful --
but I suggest that these can be provided by changing the run-time
type system (Vyper provides parameterised types now, although the
mechanism is a bit crude as yet).

	The requisite type assertion operator can be implemented
in a few minutes. Changing the CPython type system to allow
any object as a type should also be relatively easy [objects of
type TypeType are still allowed, and are treated specially]
This will bring CPython up to the current status of Vyper.
It will allow Stein to get on with writing the type checker,
and others to get on with optimising C code generators,
and it will allow programmers to begin annotating their code
right away.

	What Python needs -- desparately -- is a way of building
categorical sums (variants) like ML's

	type t = X | Y of int

Switching on 'type' as in a typecase is a BAD idea.
This is done in C++ using

	if(X *x=dynamic_cast<X*>(y)) { ...}

that is, using a small modification to the C 'if' construction,
Stroustrup refused to bless this hackery with a special new
construction.
Because it is the WRONG idea, switching on type. This can be seen by
considering the construction:

	type measurement = Metres of float 
| Feet of float

the values have the SAME type. The switching must occur on an explicit
tag.

Interestingly, there are TWO constructions already in the Grammar that
allow this:

	(tag, value) # a tuple
	 tag: value  # dictionary entry

.. apart from using classes (ugly***):

	class Tag: def __init_(self, value): self.value = value

I've actually implemented:

	{ 1 : 10, 2 : 20 } {
		1: lambda x: x
		2: lambda x: x * x
		3: lambda x: x * x * x
	}

which is a kind of switch allowing multiple selections.
The result is { 1: 10, 2: 400 }. Unfortunately, the parse is
'atom with trailers' and the trailer must use explicit {}.

*** this is wrong, because the type of 'Metres 20.0'
is the same as the type of 'Feet 20.0', namely 'measurement'.
Using classes confuses tagging (unification), with subtyping,
which are diametrically opposed in nature. [Unification is
a disjoint union, subtyping is a subset relation]

This is one of the most serious faults in most OO systems,
and the most serious abuse of inheritance. Luckily,
advanced systems like ocaml get all this right (variants
are independent of classes which allow inheritance, which 
is independent of signature based subtyping) -- even if the
result is somewhat clumbsily written, and has annoying restrictions.

------------------

In any case, I think the best thing for Python is the minimal
syntactic change and architectural change -- because anything
else will ALSO be incomplete/inadequately expressive/non-deterministic,
and in ADDITION it will likely be very complex.

Here's a possible scenario:

	1) Allow any object to be a type object
	2) Provide operator ! for expressions and function 
		parameters (and returns?)
	3) Disallow certain operations like writing to module attributes
		after the module is imported
	4) Unify classes/type better

I believe these features will significantly improve opportunities
for early error diagnosis and optimisation --WITHOUT adding any
complex new sublanguages or constructions.

The result will be a 95% improvement in static assurance :-)

Perhaps, with this done, we can learn enough to make the next
improvement. In some sense, I like Greg's approach: you implement
SOMETHING minimal, then try to improve it, until you find
the difference between 'tricky to get right' and 'there is not
enough information to go further -- we need to know THIS'.

When we know what THIS really is, we can rethink how to provide
that information in the language, so as to improve the performance
of working checkers. The result will never be as good as a 
'designed from scratch using mathematics' language, but if you want
that, 
you'd be using Charity.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
homepage: http://www.maxtal.com.au/~skaller
download: ftp://ftp.cs.usyd.edu/au/jskaller