Please enlighten me about PyPy

Thu Dec 22 09:07:04 EST 2005

Hi!

Luis M. González wrote:
> Well, first and foremost, when I said that I leave the door open for
> further explanations, I meant explanations by other people more
> knowlegeable than me :-)

You did a very good job to describe what PyPy is in this and the 
previous mail! I will try to give a justification about why PyPy is done 
how it is done.

> 
>>Now I'm confused again--psyco translates Python into machine code--so
>>how does this tie in with the fact that the interpreter written in
>>Python is translated into another language (in this case C?)
> 
> 
> No, the psyco-like techniques come later, after the rpython interpreter
> is auto-translated to c. They are not used to translate the interpreter
> to c (this is done through a tool that uses type inference, flow-graph
> anailisis, etc, etc).
> Getting the rpython auto-translated to C is the first goal of the
> project (already achieved).
> That means having a minimal core, writen in a low level language (c for
> speed) that hasn't been writen by hand, but auto-translated to c from
> the python source -> much easier to improve and maintain from now on.

Indeed. The fact that the core is written in RPython has a number of 
advantages:

The first point is indeed maintainability: Python is a lot more flexible 
and more concise than C, so changes and enhancements become much easier. 
Another point is that our interpreter can not only be translated, but 
also run on top of CPython! This makes testing very fast, because you 
don't need to translate the interpreter first before testing it -- just 
run in on CPython.

The most important advantage of writing the interpreter in Python is 
that of flexibility. In CPython a lot of implementation choices are done 
rather early: The choice to use C as the platform the interpreter works 
on, the choice to use reference counting (which is reflected 
everywhere), the choice to have a GIL, the choice to not be stackless. 
All these choices are deeply embedded into the implementation and are 
rather hard to change. Not so in PyPy. Since the interpreter is written 
in Python and then translated, the translation process can change 
different aspects of the interpreter while translating it. The 
interpreter implementation does not need to concern itself with all 
these aspects.

One example of this is that we are not restricted to translate out 
interpreter to C. There are currently backends to translate RPython to 
C, LLVM (llvm.org), JavaScript (incomplete) and plans to write a 
Smalltalk and a Java backend. That means that we could potentially 
generate something that is similar to Jython -- which is not entirely 
true, because the interfacing with Java libraries would not work, but 
pypy-java would run on the JVM.

Another example is that we can choose at translation time which garbage 
collection strategy to use. At the moment we even have two different 
garbage collectors implemented: one simple reference counting one and 
one that uses the Boehm garbage collector. We have also started (as part 
of my Summer of Code project) an experimental garbage collection 
framework which allow us to implement garbage collectors in Python. This 
framework is not finished yet and needs to be integrated with the rest 
of PyPy.

In a similar manner we hope to make different threading models choosable 
at translation time.

[snip]
> 
> Now this is both, a conclusion and a question (because I also have many
> doubts about it :-):
> At this moment, the traslated python-in-python version is, or intends
> to be, something more or less equivalenet to Cpython in terms of
> performance. Because it is in essence almost the same thing: another C
> python implementation. The only difference is that while Cpython was
> written by hand, pypy was writen in python and auto-translated to C.

Yes, at the moment pypy-c is rather similar to CPython, although slower 
(a bit better than ten times slower than CPython at the moment), except 
that we can already choose between different aspects (see above).

> What remains to be done now is implementing the psyco-like techniques
> for improving speed (amongst many other things, like stackless, etc).

Stackless is already implemented. In fact, it took around three days to 
do this at the Paris sprint :-). It is another aspect that we can choose 
at translation time (that means you can also choose to not be stackless 
if you want to). With stackless we can support arbitrarily deep 
recursion (until the heap is full, that is). We don't export any 
task-switching capabilities to the user, yet.

About the psyco-like JIT techniques: we hope to be able to not write the 
JIT by hand but to generate it as part of the translation process. But 
this is at the moment still quite unclear, in heavy flux and nowhere 
near finished yet.

Cheers,

Carl Friedrich Bolz