[pypy-dev] Questions from a lurker

Sat Jul 16 12:55:01 CEST 2005

Hi Arthur!

Arthur Peters wrote:
> Hello, I am Arthur. I've been lurking on the list for a month or so and
> I've had a good bit of fun playing with the code.

Welcome to PyPy! I'll try to take a stab at some of the questions. 
Corrections are welcome.

> My questions are:
> 
> - Why did you start a second LLVM backend? Was the first one badly miss
> designed?

When I wrote the first LLVM backend the RTyper did not exist yet. This 
resulted in a big duplication of work: I basically implemented a lot of 
things on LLVM level (like lists, tuples, classes...) which would have 
been useful for other C-ish backends as well. Later the RTyper was 
written which made all that work superfluous because the RTyper did 
exactly that: implement all those things on a level where every other 
C-ish backend could use it. Eric and me tried to adapt the old LLVM 
backend to the new model but it didn't work very well (because the 
assumptions where totally different). Thus Holger and me started the new 
LLVM backend on a weekend.

> - So I take it the initial goal will be to translate pypy into C (or
> LLVM) and create a binary executable. At that point will it be possible
> to translate any given program or will the translator still only work
> with a subset of the python language? Will the final goal be a JIT VM
> for python or will I at some point be able to statically compile python
> to machine code?

The translator will always only work on a subset of Python. It's 
probably not really possible to statically compile arbritrary Python 
code to machine code (at least not whithout cheating and using the whole 
interpreter in the machine code again :-) ).

> 
> - Will there be support for multiple specialized versions of functions
> like in psyco? I know this is a long way down the road but I'm curious
> what people think.

The idea is to integrate Psyco's ideas, yes. Although there are not 
really multiple specialized versions of a function in Psyco, it's rather 
one function that does different things depending on the type of the 
arguments.

> - I read what some discussion of the GIL as it relate to pypy and I agree
> that the GIL need to be implemented and that a different thread model
> might be a good way to go. However I thought of the following: Would it
> be possible to detect when a section of code only uses local variables
> and unlock the GIL? This seems possible because local variables will
> cannot be shared between threads anyway. In addition, local variables
> likely be translated into more basic types than a python object (native
> ints and floats for instance), that would not require any interaction
> with the object-space to manipulate (not sure about that usage of the
> term "object-space"). Thoughts?

First a comment about the usage of the term "object space": I think you 
are mixing levels here (and I might misunderstand you). The PyPy 
interpreter (meaning the bytecode interpreter plus the standard 
objectspace) gets translated to low level code. This "interpreter level 
code" deals with the standard object space as a regular class instance 
that is in principle in no way different than any other class. The 
classes that appear on interpreter level are all translated to more 
basic types (what would be the alternative, there is no layer below that 
could deal with anything else), probably to something like a struct. 
Thus the operations of objects at this level never need to be done via 
the object space -- the object space is rather a regular object in itself.

The object space /is/ used, if you interpret a Python program with 
PyPy's interpreter. The bytecode interpreter does not now how to deal 
with /any/ object (except for True and False), it has to delegate to the 
object space for every single operation. Even for basic types like ints 
and such -- at this level ("application level") there isn't any type 
inference or anything like that!

Now on threading: I'm not really the right person to say much about it. 
As far as I understand it, the general idea is that we don't want to 
clutter our whole interpreter implementation with threading details. 
Instead threading is supposed to become a translation aspect: The 
translation process is meant to "weave" the threading model into the 
translated interpreter. This would have a lot of advantages: Instead of 
being stuck with a single threading model which is deeply integrated 
into all the parts of our interpreter and hard to get rid of again, we 
can change it by changing a small localized part of whatever (probably 
the translator). Thus it would be possible to translate an interpreter 
which uses a GIL -- appropriate for an environment where threads are 
rarely used. Or we could translate an interpreter with, say, more finely 
grained locking which would be slower for a single threaded program but 
could speed up applications with multiple threads.

[snip]

Hope that helped a bit and wasn't too confused :-).

Regards,

Carl Friedrich