[Types-sig] recursive types, type safety, and flow analysis (was: Recursive types)

Guido van Rossum guido@CNRI.Reston.VA.US
Tue, 21 Dec 1999 18:38:58 -0500


[Greg Stein]

> I should have clarified that I don't think his particular example would
> work because of the compile- / definition- time recursion of the names.
> Runtime? Sure. It's fine.

Hm...  Since type checking is essentially a compile time activity, I
think it would be better if the run time order of events didn't
matter.  Yes, in Pascal or C you need to declare everything before
it's used.  This is a compromise because of old-fashioned one pass
compiler technology.  I don't see a reason why we should adopt this
rule in Python.  Note that Java doesn't have this either -- you can
declare your methods in any order you like and the compiler will
figure it out.

But, you may say, in Python we have a certain run time order of events
that defines the validity of names.  Names must be defined before they
are used, and a later redefinition overrides an earlier one.

Of course.  (Although I wouldn't mind getting at least a compile time
warning when I define two classes or methods with the same name; it
can be frustrating when you're editing the first definition but your
program keeps using the second one! :-)

But checking that names are defined by the time they are used at run
time is a different kind of check.  (Java does this too, to a decent
extent.)  I personally find this a very useful check.  But it doesn't
necessarily affect the compile time static typechecking.

> Because Y is not defined. This is analogous to the following code:
> 
> class Foo:
>   def build_it(self, x, y, cls=Bar):
>     return cls(x, y)
> 
> class Bar:
>   ...
> 
> The above code breaks. I am positing that if you put "Y" into a
> declarator, then it should exist at that point in time. Where "time" is
> specified as following the flow of execution as the functions/classes are
> defined.

I disagree.  See above -- there's no reason to burden the compile time
type checker with run time ordering.

> I don't think we can detect that call-before-definition, though.

But I think you can, in by far the most cases.  There may be a few
borderline cases where it's impossible to tell, and I don't mind
requiring a little help to make the type checker happy.  For example,
in C code I frequently add an initialization of a local variable to 0
which isn't really necessary because it is initialized in a for loop,
but the compiler isn't smart to figure out that the for loop will
execute at least once.  Gcc -Wall complains about such cases, and
shutting it up completely every once in a while is sufficiently
satisfying that I'll add redundant code.  Of course, I'd be happier if
gcc was smarter, and I hope that Python's type checker will usually be
smarter -- and then in the remaining cases I think it's okay to help
it.

> I think your point can be restated as: Can we type-check the following
> code?
> 
> def f() -> String:
>   return g()
> 
> def g() -> String:
>   ...
> 
> I haven't thought about this particular scenario or the resulting impact
> on the inferencer. We probably require some kind of a two-pass analysis as
> John points out. Maybe it is as simple as deferring analysis of function
> bodies until the global "body" is analyzed. Actually, I think the deferral
> mechanism is sufficient, as that mirrors the execution environment: at the
> time the function body is executed, the globals are defined.
> [ with the caveat of call-before-define, but we can't determine that ]

I don't see a big problem here for the type checker.  Assuming that
there's only one definition of g, and that we disallow changes to g
from outside the module (and from exec statements), the type checker
will have no trouble discovering that g is a global function
definition, and it can collect its type info to help checking f.
There may even be arbitrary cross references; the solution (from the
type checking point of view) is to iterate until all definitions are
found.

Again, checking that g is actually defined by the time f is called is
a separate thing; but again in most cases this will be easy, since
there is usually no executable code between the definitions of f and g
(except perhaps other function definitions).  It's a simple flow check.

> Hrm. The whole call-before-define problem might actually axe Paul's desire
> for type safety. I don't think we can *ever* guarantee that a name will
> exist at the time it is used. For example:
> 
> value = 1
> def typesafe f():
>   somefunc(value)
> 
> del value
> f()
> 
> If you start doing *control* flow analysis, then you might be able to
> definitely state the above code is in error. But then, I'll just throw
> this wrench at it:
> 
> if sometest():
>   del value
> f()
> 
> Now what?

Simple.  After the if statement has executed, value is "possibly
undefined".  This warrants a warning.

> The type inferencing that I believe we can/should be using is based on
> some basic data flow, without regard to definitively determining whether a
> particular branch is reached or not. If a possibility exists, then the
> possible types are unioned in. If type-safety is defined as "no
> NameErrors", then we have a problem, as control flow is required.

I don't see the problem.  I claim that the examples you give are
accidents waiting to happen, so it's only helpful if the type checker
complains about them!

> Note that I believe the following can be handled:
> 
> value = 1
> def typesafe f()
>   func_taking_int(value)
> 
> value = "foo"
> f()
> 
> In this case, the global variable "value" has a typedecl of: Int or
> String. This would fail the func_taking_int() function call.

Yes.  In my view, "possibly undefined" is no different than "int or
string".

> Back to the other point: I do believe that you should not use a name in a
> type-declarator if it isn't defined.

I like the idea better (I think proposed by Tim Peters) that the names
used for type declarations live in a separate compile time namespace
where different rules apply.  (Even though there are obvious
correspondences, e.g. the names of defined or imported classes should
probably be available both at compile time and at run time.)

--Guido van Rossum (home page: http://www.python.org/~guido/)