[pypy-dev] Questions

Fri Dec 12 00:20:22 CET 2003

[Sorry, same email - formatting fixed]

Hi Holger,

On Tue, 9 Dec 2003, holger krekel wrote:

> Hi Richard,
>
> [Richard Emslie Tue, Dec 09, 2003 at 04:04:35PM +0000]
> > I've been reading through the source code and the docs, and getting some
> > jist of what is going on.  I guess what I was expecting to see something
> > more like the CPython code but in python (like why do we have different
> > object spaces, although I see the errors of my ways now :-) ) and was
> > failing to understand the big picture.
>
> understandable.  Reverse engeneering documentation from plain code
> is not always easy :-)

:-)

Thanks Holger for great responses... it has certainly cleared up a few
things. One thing that is really interesting in understanding PyPy thus
far, is that the puzzle has two sides; how does it work and why is it done
in such a way.  For instance we can count 10 different types of frame
object in the interpreter and stdobjspace.

What would be a really nice part of the architecture introduction
(although I imagine there are other, better ideas) - is to step through a
few simple code examples running in an initialised stdobjspace
"interactive.py" session, describing the various object
creation/interactions on the way (ExecutionContext, Code, Frame, objects)
and how method dispatching to the object spaces flows from Code/Frames.

And then some idea of how the current bootstrapping is working for
stdobjspace (see * below).

It might serve as a nice basis for documenting too (yup I'm volunteering
:-))

>
> > So reading between the lines, does this sound anything quite like what we
> > are trying to achieve...
> >
> > The abstraction of the object spaces are so we can perform abstract
> > interpretation with one set, a working interpreter with another, a minimal
> > interpreter with another, and goodness knows what else ;-)
>
> right.
>
> > So to create our initial interpreter, we take the interpreter code, multi-method
> > dispatcher and the standard object space and we can abstractly interpret
> > with the interpreter/flow object space/annotation.
>
> yes, more precisely the interpreter/flowobjspace combination should be
> able to perform abstract interpretation on any RPython program. RPython
> is our acronym for "not quite as dynamic as python". But note, that
> we basically allow *full dynamism* including metaclasses and all the
> fancy stuff during *initialization* of the interpreter and its object
> spaces. Only when we actually interprete code objects from an
> app-level program we restrict the involved code to be RPythonic.

That explains a lot, I was ironically starting to think RPython is really
very dynamic, but after the dust settles I guess that's it.  I am assuming
therefore on the call to initialize() [do europeans generally follow
american spelling? ;-)] we are free to do all sorts of dynamic
manipulation to our classes and objects - However, during the course of
building the sys module & builtins (*) we seem start interpreting some
bytecodes!!  How is that possible if we don't have any object spaces ready
to act on?

>
> The interpreter/flowobjspace combination will start abstract
> interpretation on some initial function object, say e.g.  frame.run().
> The frame and the bytecode/opcode implementations it invokes will work
> with e.g. the StdObjSpace. The flowobjspace doesn't care on which
> objspace the frame/opcodes execute. The flowobjspace and its interpreter
> instance don't care if they run on something else than pypy :-)
>
> Actually thinking in more detail about this will probably lead us into the
> still muddy waters of the whole bootstrapping process but let's not get
> distracted here:-)

Do you mean what was described above with the bytecode being interpreted
before initialisation is complete - or are we talking about memory
management, internal representation of basic object types in the object
space (lists, ints, floats ) system calls (block/nonblocking), system
resources (file descriptors), garbage collection and whatnot.  Ok lets not
get distracted... :-)

>
> > That stage involves
> > building up a set of basic blocks, building a flow graph, type inference
> > and then translating (sorry I get a bit lost here with what happens where,
> > ie when does the flow object space stop and annotation start, but the
> > answer to that one is to read more code ;-) ) to pyrex/CL/other low level
> > code.
>
> exactly.
>
> > Does that sound about right so far?   Then do either of these make sense
> > (purely speculation... and most likely nonsense)
> >
> > Also if we write the flow object space and annotation in RPython we can
> > pipe that through itself, to generate low level code too.  Now my main
> > question is - how do we combine the two object spaces such that we do
> > abstract intepretation and annotation in a running interpreter (also I
> > guess we would either need some very low level translation, ie machine
> > code or some LLVM like architecture to do this?)
>
> (first: see my above reference of muddy waters :-)
>
> In theory, we can annotate/translate flowobjspace itself, thus producing
> a low-level (pyrex/lisp/c/llvm) representation of our abstract
> interpretation code. When executing this lower-level representation
> on ourself again we should produce to same representation we are
> currently running.

Yes, I see now.  For some reason I thought they would be different.

I think this is similar to the 3-stage gcc building
> process: First it uses some external component to build itsself
> (stage1). It uses stage1 to compile itself again to stage2. It then uses
> stage2 to recompile itself again to stage3 and sees if it still works.
> Thus the whole program serves as a good testbed if everything works right.
>

Funny I used to compile twice doing stage 1 and 2 manually back when
redhat were producing buggy versions, if I only knew! ;-)

> > Once we have broken the interpeter - standard object space into a finite -
> > into a set of blocks and graph, and translate those blocks into low level
> > code - we could view any python bytecode operating on this as a traversal
> > over the blocks.
>
> Hmm, yes i think that's right although i would rephrase a bit: the flowgraph
> obtained from abstract interpretation is just another representation of a/our
> python program.  Code objects (which contain the bytecodes) are
> themselves a representation of python source text.

It does have other cool implications if we have a low enough translation
language we could do away with stacks and frames for execution... :-)

>
> The flowgraph of course provides a lot of interesting information (like
> all possible code pathes and low-level identification of variable state)
> and makes it explicitely available for annotation and translation.
> Btw, at the moment annotation justs *uses* the flowgraph but not he
> other way round.  (In the future we might want to drive them more in
> parallel in order to allow the flowobjspace code to consult the
> annotation module. Then the flowgraph code could possibly avoid
> producing  representations where annotation/type inference is not able
> anymore to produce exact types).

Can I ask the silly question of what does annotation actually mean?  Is it
seperate from type inference?  Don't really follow the parallel part.

With RPython are we assuming that we can always produce exact types?

Is the idea for non-determinsitic points (ie nodes where we cannot infer
the types) to be revealed and then propagated up the graph to highest node
where it first can be determined and create a new snapshot of nodes when
any new type enters that point and translate, and adding caching so we
don't have to recreate the snapshot/translation each time (high chance it
is going to be the same type)?

>
> > Therefore we could create a new flow graph from this
> > traversal, and feed it into some LLVM like architecture which does the low
> > level translation and optimisation phase for us??
>
> There is no need to take this double-indirection. We can produce LLVM
> bytecode directly from python-code with a specific translator (similar to
> genpyrex/genclisp). We could translate ourself to make this faster, of
> course.  For merging Psyco techniques we will probably want to rely on something
> like LLVM to do this dynamically. Generating C-code is usually a pretty
> static thing and cannnot easily be done at runtime.

:-) Yes not the best way with the double indirection.

>
> > Thanks for any feedback... :-)
>
> you are welcome. Feel free to followup ...

Yes thanks again! Looking forward to next week... :-)

Cheers,
Richard