[Types-sig] Type Inference I

Greg Stein gstein@lyra.org
Sat, 18 Dec 1999 12:38:19 -0800 (PST)


On Sun, 19 Dec 1999, skaller wrote:
>...
> Ok Greg, lets see where we agree and what we understand.
> 
> First, interpreter python is too damn slow for some applications.
> Also, errors sometimes get reported later than we'd like.
> 
> We'd both like to:
> 
> 	(OPT)	be able to translate Python sources in to C 
> 		which runs faster than interpreter python
> 
> 	(ERR)	find errors in a program, before running it
> 
> I hope we agree on these points so far.

Sure.

> Now, here is something I believe, mainly from comments
> made at various times by Guido, Tim, and others:
> people have tried compiling python before, and found that
> the resulting C code didn't run much faster than the
> interpreter. Thats mainly because these compilers didn't
> know anythong about the types, they just generated API
> calls corresponding to what the byte code interpreter would
> execute -- and the interpreter is pretty fast already.

Bill Tutt and I have done it and measured about 30% speed improvement in
most cases. Not as lot as most people would hope for, but definitely
there. Bill is continuing to improve the code.

> So the question arises: how well can we do if we know
> what the types of things are, or at least some of them?
> I am going to assume, without evidence, that we can do better,
> and I'm going to assume you agree.

Agreed.

> So now the question arises: how can we find out the types?
> Now, I am going to TELL you, that there is some evidence,
> that we can in fact do surprisingly well without changing
> Python one little bit. We can do better, if we analyse a whole
> program. We can do better, if we make some assumptions.
> I'd like you to accept this, without argument, because I cannot
> prove it: I've only done a micky mouse experiment, but JimH
> has done a less micky one, I'm told.

Sure, I accept that we can.

But to state up front: I don't think we want to rely on whole-program
analysis. At the moment, I am assuming that a type-checking tool will not
be part of the [byte-code] compiler -- that is is just too much and too
slow to directly include. However, that obviously negates a number of
things that the compiler could do if it knew the types. For example, maybe
we introduce some integer-manipulation opcodes because we find they would
be beneficial to 90% of Python programs. An external tool doesn't let
Python take advantage of them.

To that end, I think we might eventually want to integrate something. And
to do that, we definitely cannot rely on whole-program analysis. In other
words, if we depend on whole-program analysis, then I don't think the
builtin, byte-code compiler will ever be able to take advantage of type
information.

>...
> So, when you say 'we' are not going to do this or that
> kind of inference, you are missing the point.
> I surely AM going to. Others certainly WILL be going to.
> This is how it is done. When you are writing a compiler,
> you use every bit of information you can to make
> it go faster.

I agree that you want to use every bit of information possible. I disagree
that I'm missing the point: I think we are discussing what will happen to
the native compiler. To that end, I *am* positing that 'we' will do <this>
or <that>.

If "third parties" (if you will) want to create an even better compiler,
than I'm all for it. However, we still want to improve the native system,
and I believe that is through a different path than you are suggesting.

> So the point is how to give the compiler more information,
> while minimising the impact on 'python' (you can read that
> as 'keeping it pythonic' if you like).

Yes.

> Now, my point, in Type Inference I and II, is that static
> type declarations are only ONE way of providing 
> more information, and they are not even the most important
> one. In fact, type inference is hampered by quite a lot 
> of other factors. 

This was entirely unclear. I saw it as some kind of weird ramble about
changing Python's exception behavior in some unclear way, and for some
unknown purpose.

Given the above paragraph: know I understand what you are trying to get
at.

> I'm sure you will agree on some. For example, I'm sure
> you understand how 'freezing' a module, or banning
> rebinding of variables afer importing, or, disallowing
> 
> 	module.attribute = xxx
> 
> will help type inference: you clearly understand that
> python is very dynamic, which makes static analysis
> difficult. Right?

Yup.

> So what I am trying to do is list all the things
> OTHER than adding optional type declarations,
> which might contribute to our agreed upon aim,
> namely, to provide a compiler with enough information
> to generate fast code.

All right.

>... module attribute assignments ...
>... add() example, explaining exceptions mess up compiler ...
> 
> 	1) The spec says: 
> 	IF the arguments are both ints ..
> 	OR IF the arguments are both strings ..
> 	OTHERWISE an exception is thrown
> 
> 	2) The spec says:
> 	IF the arguments are both ints ..
> 	OR IF the arguments are both strings ...
> 	OTHERWISE THE BEHAVIOUR IS UNDEFINED
>...
> Therefore, there is a performance advantage in adopting
> spec (2) as a language specification, instead of (1).
> Note this does not mean the generated code will crash,
> if x is not an integer. What it means is that if the
> compiler detects that x is not an integer, it can
> generate a compile time error. It is NOT allowed to
> do that with specification (1).

Interesting point. As a user of Python, I like (1) and do not want to see
Python to change to use (2). Sure, it hurts the compiler, but it ensures
that I always know what will happen.

> So my point is: the Python documentation contains
> many examples where it says 'such and such an exception
> is thrown', and this prevents generating fast code,
> and it prevents early error detection. The point
> is that throwing an exception is _well defined_ behaviour,
> and it would be better if the specification said
> the program was in error. That way, a compiler
> can report the error at compile time.

While true, I think the compiler will still know enough about the type
information to generate very good code. If somebody needs to squeak even
more performance, then I'd say Python is the wrong language for the job. I
do understand that you believe Python can be the right language, IFF it
relaxes its specification. I hope it doesn't, though, as I like the
rigorous definition.

>... catching exceptions ...
> is a valid python code fragment, and it relies on
> x + 1 throwing an exception if x is not an integer.
> So IF we continue to allow this, we cannot deduce
> that x must be an integer, and this prevents optimising
> generated code, both here, and in other places in the
> program.

It only prevents it in the presence of exception handlers. You can still
do a lot of optimization outside of them. And a PyIntType_Check() test
here and there to validate your assumptions is techically more expensive
than not having it, but (IMO) it is not expensive in absolute terms.

>...
> For this reason, I think it is important that the Types SIG
> also examine when it is legitimate to catch an exception
> at run time, and do something, and when the code is just 
> plain wrong, and a compiler can reject it.

Sure.

> My proposal, and I thought it was fairly 'concrete' as
> you required, was that apart from EnvironmentErrors,
> all _standard_ exceptions not trapped within a function body
> (I said 'lexically local' before, but now I'll be more
> specific), are in fact programming errors: the programmer
> may NOT rely on catching any standard exceptions
> other than environment errors, generated by code inside
> a client written function.

All right. This is clear now. And it is clearly something that I would not
want to see :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/