Python from Wise Guy's Viewpoint

Tue Nov 11 03:04:20 EST 2003

Pascal Costanza <costanza at web.de> writes:

>OK, let's try to distill this to some simple questions.
>
>Assume you have a compiler ML->CL that translates an arbitrary ML 
>program with a main function into Common Lisp. The main function is a 
>distinguished function that starts the program (similar to main in C). 
>The result is a Common Lisp program that behaves exactly like its ML 
>counterpart, including the fact that it doesn't throw any type errors at 
>runtime.
>
>Assume furthermore that ML->CL retains the explicit type annotations in 
>the result of the translation in the form of comments, so that another 
>compiler CL->ML can fully reconstruct the original ML program without 
>manual help.
>
>Now we can modify the result of ML->CL for any ML program as follows. We 
>add a new function that is defined as follows:
>
>(defun new-main ()
>   (loop (print (eval (read)))))
>
>(We assume that NEW-MAIN is a name that isn't defined in the rest of the 
>original program. Otherwise, it's easy to automatically generate a 
>different unique name.)
>
>Note that we haven't written an interpreter/compiler by ourselves here, 
>we just use what the language offers by default.
>
>Furthermore, we add the following to the program: We write a function 
>RUN (again a unique name) that spawns two threads. The first thread 
>starts the original main function, the second thread opens a console 
>window and starts NEW-MAIN.
>
>Now, RUN is a function that executes the original ML program (as 
>translated by ML->CL, with the same semantics, including the fact that 
>it doesn't throw any runtime type errors in its form as generated by 
>ML->CL), but furthermore executes a read-eval-print-loop that allows 
>modification of the internals of that original program in arbitrary 
>ways. For example, the console allows you to use DEFUN to redefine an 
>arbitrary function of the original program that runs in the first 
>thread, so that the original definition is not visible anymore and all 
>calls to the original definiton within the first thread use the new 
>definition after the redefinition is completed. [1]
>
>Now here come the questions.
>
>Is it possible to modify CL->ML in a way that any program originally 
>written in ML, translated with ML->CL, and then modified as sketched 
>above (including NEW-MAIN and RUN) can be translated back to ML?

Yes, it is possible.  For example, have CL->ML ignore the type annotation
comments, and translate CL values into a single ML discriminated union type
that can represent all CL values.  Or (better, but not quite the same semantics)
translate those CL values with ML type annotations back to corresponding ML
types, and insert conversions to convert between the generic CL type and
other specific types where appropriate.

Suppose the original ML program defines the following functions

	foo : int -> int
	bar : string -> string
	...

We can add dynamic typing like this:

	datatype Generic = Int of int
			 | String of string
			 | Atom of string
			 | Cons Generic Generic
			 | Apply Generic Generic
			 | ...

	fun foo_wrapper (Int x) = (Int (foo x))

	fun bar_wrapper (String x)  (String (foo x))

and then dynamic binding like this

	val foo_binding = ref foo_wrapper

	val bar_binding = ref bar_wrapper

For dynamic binding, function calls will have to be translated to does
an indirection.  So a call

	let y = foo x

will become

	let y = !foo_binding x

or, if x and y are used in a way that requires that they have type int,
then

	let (Int y) = !foo_binding (Int x)

We can then simulate eval using an explicit symbol table.

	fun lookup "foo" = !foo_binding
	  | lookup "bar" = !bar_binding
	  | lookup "define" = !define_binding
	  ...

	fun define "foo" f = foo_binding := f
	  | define "bar" f = bar_binding := f
	  | define "define" f = define_binding := f
	  ...

	fun define_wrapper (Cons (Atom name) body) =
		let () = define name body in Atom "()"
	val define_binding = ref define_wrapper

	fun eval (Apply func arg) =
		case (eval func) of
		Atom funcname => (lookup funcname) (eval arg)
	  | eval x = x

Note that our symbol table includes an entry for the function "define",
so that eval can be used to modify the dynamic bindings.

The rest (e.g. read) is straight-forward.

>To ask the question in more detail:
>
>a) Is it possible to write CL->ML in a way that the result is both still 
>statically type checkable

Yes, but only in a weak sense.  Since we are allowing dynamic binding,
and we want to be able to dynamically bind symbols to a different type,
we're definitely going to have the possibility of dynamic type errors.
The way this is resolved is that things which would have been type errors
in the original ML program may become data errors (invalid constructor
in an algebraic type Generic) in the final ML program.

>and not considerably larger than the original program that was given to
>ML->CL.

There's a little bit of overhead for the wrapper functions of type
"Generic -> Generic", and for the variables of type "ref (Generic -> Generic)"
which store the dynamic bindings.  It's about two lines of code per
function in the original program.  I don't think this is excessive.

>Especially, is it possible to do this 
>without implementing a new interpreter/compiler on top of ML and then 
>letting the program run in that language, but keep the program 
>executable in ML itself?

The program includes a definition for "eval", and "eval" is an
interpreter, So in that sense, we have added a new interpreter.
But the bulk of the program is written in ML, without making use of eval.

>c) If you respond with yes to either a or b, what does your sketch of an 
>informal proof in your head look like that convincingly shows that this 
>can actually work?

See above.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.