[Python-Dev] Progress on Psyco

Thu, 20 Sep 2001 18:31:54 +0200 (CEST)

Hello everybody,

A short note about my progress on Psyco, the kind of compiler for Python.

In its latest form it is a kind of just-in-time compiler, working at
run-time. It consists of two parts: a language-independent "glue" part
(which I more or less finished), and the inner loop of the Python
interpreter rewritten in a special, resticted language. The "glue" part
executes Python programs by running the interpreter, but not in the common
way. The run-time data are split into what must be known to produce good
processor code, and what need not be. Typically, we need to know at least
which Python code to interpret before we can make good processor code that
performs similarily; in the case of Python we will probably also need to
know the type fields of the actual Python objects fed to the Python code;
the processor code can be very efficiently optimized depending on the
type. On the other hand, if the Python objects are, say, integers, then
the same processor code can easily manipulate all possible integer values.
The "glue" code is responsible for choosing which data goes into which
category, and managing caches of the produced code. The Python-dependent
part of the code either makes computations at compile-time (for
compile-time-known values) or emits code to do it at run-time (for the
rest).

The choice between the two categories of data is not done statically; I
mean, I didn't say "Python code and type fields are in category 1, the
rest in category 2", which would amount to writing a JIT compiler with
type-analysis. No; I let this separation be done based on the uses of the
data. In two words, if I have a "switch" on data, or a call to function
pointer, then knowing the data at compile time enables significant
improvements, so I mark that data as "category 1"; as long as this does
not occur data is in "category 2".

This makes the approach quite general, I think. Experiment will be needed
to know whether it performs reasonably well. Another interesting point to
note is that this idea applies to any other interpreted language; all you
need to do is rewrite the inner loop of the interpreter for the other
language in a special way. Orthogonally, you can change the target
processor by adapting parts of the "glue" code.

More about it later,

Armin.