[pypy-dev] Questions

Fri Dec 12 00:11:30 CET 2003

Hi Holger,

On Tue, 9 Dec 2003, holger krekel wrote:
> Hi Richard,
>
> [Richard Emslie Tue, Dec 09, 2003 at 04:04:35PM +0000]
> > I've been reading through the source code and the docs, and getting
some
> > jist of what is going on.  I guess what I was expecting to see
something
> > more like the CPython code but in python (like why do we have
different
> > object spaces, although I see the errors of my ways now :-) ) and was
> > failing to understand the big picture.
>
> understandable.  Reverse engeneering documentation from plain code
> is not always easy :-)

:-)

Thanks Holger for great responses... it has certainly cleared up a few
things.
One thing that is really interesting in understanding PyPy thus far, is
that the
puzzle has two sides; how does it work and why is it done in such a way.
For
instance we can count 10 different types of frame object in the
interpreter and
stdobjspace.

What would be a really nice part of the architecture introduction
(although I
imagine there are other, better ideas) - is to step through a few simple
code
examples running in an initialised stdobjspace "interactive.py" session,
describing the various object creation/interactions on the way
(ExecutionContext, Code, Frame, objects) and how method dispatching to the
object spaces flows from Code/Frames.

And then some idea of how the current bootstrapping is working for
stdobjspace
(see * below).

It might serve as a nice basis for documenting too (yup I'm volunteering
:-))

>
> > So reading between the lines, does this sound anything quite like what
we
> > are trying to achieve...
> >
> > The abstraction of the object spaces are so we can perform abstract
> > interpretation with one set, a working interpreter with another, a
minimal
> > interpreter with another, and goodness knows what else ;-)
>
> right.
>
> > So to create our initial interpreter, we take the interpreter code,
multi-method
> > dispatcher and the standard object space and we can abstractly
interpret
> > with the interpreter/flow object space/annotation.
>
> yes, more precisely the interpreter/flowobjspace combination should be
> able to perform abstract interpretation on any RPython program. RPython
> is our acronym for "not quite as dynamic as python". But note, that
> we basically allow *full dynamism* including metaclasses and all the
> fancy stuff during *initialization* of the interpreter and its object
> spaces. Only when we actually interprete code objects from an
> app-level program we restrict the involved code to be RPythonic.
>

That explains a lot, I was ironically starting to think RPython is really
very
dynamic, but after the dust settles I guess that's it.  I am assuming
therefore
on the call to initialize() [do europeans generally follow american
spelling?
;-)] we are free to do all sorts of dynamic manipulation to our classes
and
objects - However, during the course of building the sys module & builtins
(*)
we seem start interpreting some bytecodes!!  How is that possible if we
don't
have any object spaces ready to act on?

> The interpreter/flowobjspace combination will start abstract
> interpretation on some initial function object, say e.g.  frame.run().
> The frame and the bytecode/opcode implementations it invokes will work
> with e.g. the StdObjSpace. The flowobjspace doesn't care on which
> objspace the frame/opcodes execute. The flowobjspace and its interpreter
> instance don't care if they run on something else than pypy :-)
>
> Actually thinking in more detail about this will probably lead us into
the
> still muddy waters of the whole bootstrapping process but let's not get
> distracted here:-)
>

Do you mean what was described above with the bytecode being interpreted
before
initialisation is complete - or are we talking about memory management,
internal
representation of basic object types in the object space (lists, ints,
floats )
system calls (block/nonblocking), system resources (file descriptors),
garbage
collection and whatnot.  Ok lets not get distracted... :-)

> > That stage involves
> > building up a set of basic blocks, building a flow graph, type
inference
> > and then translating (sorry I get a bit lost here with what happens
where,
> > ie when does the flow object space stop and annotation start, but the
> > answer to that one is to read more code ;-) ) to pyrex/CL/other low
level
> > code.
>
> exactly.
>
> > Does that sound about right so far?   Then do either of these make
sense
> > (purely speculation... and most likely nonsense)
> >
> > Also if we write the flow object space and annotation in RPython we
can
> > pipe that through itself, to generate low level code too.  Now my main
> > question is - how do we combine the two object spaces such that we do
> > abstract intepretation and annotation in a running interpreter (also I
> > guess we would either need some very low level translation, ie machine
> > code or some LLVM like architecture to do this?)
>
> (first: see my above reference of muddy waters :-)
>
> In theory, we can annotate/translate flowobjspace itself, thus producing
> a low-level (pyrex/lisp/c/llvm) representation of our abstract
> interpretation code. When executing this lower-level representation
> on ourself again we should produce to same representation we are
> currently running.

Yes, I see now.  For some reason I thought they would be different.

>I think this is similar to the 3-stage gcc building
> process: First it uses some external component to build itsself
> (stage1). It uses stage1 to compile itself again to stage2. It then uses
> stage2 to recompile itself again to stage3 and sees if it still works.
> Thus the whole program serves as a good testbed if everything works
right.

Funny I used to compile twice doing stage 1 and 2 manually back when
redhat were
producing buggy versions, if I only knew! ;-)

>
> > Once we have broken the interpeter - standard object space into a
finite -
> > into a set of blocks and graph, and translate those blocks into low
level
> > code - we could view any python bytecode operating on this as a
traversal
> > over the blocks.
>
> Hmm, yes i think that's right although i would rephrase a bit: the
flowgraph
> obtained from abstract interpretation is just another representation of
a/our
> python program.  Code objects (which contain the bytecodes) are
> themselves a representation of python source text.

It does have other cool implications if we have a low enough translation
language we
could do away with stacks and frames for execution... :-)

>
> The flowgraph of course provides a lot of interesting information (like
> all possible code pathes and low-level identification of variable state)
> and makes it explicitely available for annotation and translation.
> Btw, at the moment annotation justs *uses* the flowgraph but not he
> other way round.  (In the future we might want to drive them more in
> parallel in order to allow the flowobjspace code to consult the
> annotation module. Then the flowgraph code could possibly avoid
> producing  representations where annotation/type inference is not able
> anymore to produce exact types).
>

Can I ask the silly question of what does annotation actually mean?  Is it
seperate from type inference?  Don't really follow the parallel part.

With RPython are we assuming that we can always produce exact types?

Is the idea for non-determinsitic points (ie nodes where we cannot infer
the
types) to be revealed and then propagated up the graph to highest node
where it
first can be determined and create a new snapshot of nodes when any new
type
enters that point and translate, and adding caching so we don't have to
recreate
the snapshot/translation each time (high chance it is going to be the same
type)?

> > Therefore we could create a new flow graph from this
> > traversal, and feed it into some LLVM like architecture which does the
low
> > level translation and optimisation phase for us??
>
> There is no need to take this double-indirection. We can produce LLVM
> bytecode directly from python-code with a specific translator (similar
to
> genpyrex/genclisp). We could translate ourself to make this faster, of
> course.  For merging Psyco techniques we will probably want to rely on
something
> like LLVM to do this dynamically. Generating C-code is usually a pretty
> static thing and cannnot easily be done at runtime.
>

:-) Yes not the best way with the double interpretation.

> > Thanks for any feedback... :-)
>
> you are welcome. Feel free to followup ...
>

Yes thanks again! Looking forward to next week... :-)

Cheers,
Richard