[Tutor] OT self-implementation?

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Jul 4 00:59:32 CEST 2005


> > Now's not the time in my life to start a comp. sci. degree. So, my
> > questions are:
> >
> > 1) What would be good terms to google for to find an explanation of
> > how the seeming magic doesn't violate all reason?


Hi Michael,

The idea is that we can use _ourselves_ as the initial compiler for a
simplified version of the language that we're trying to compile.  We know,
for example, that something like:

######
print "hello world"
######


should produce something machine-like, like:

    1.  Put the bytes "hello world" somewhere where the system can see
        things.

    2.  Invoke whatever the underlying system provides us to print those
        bytes.

and if we knew what the hardware looked like, we could write out the exact
low-level instructions that produces the words "hello world" on screen.


Just as a sample of what these lower-level instructions might look like,
here's a sample using Python's bytecode disassembler:

######
>>> import dis
>>> def say_hello():
...     print "hello world"
...
>>> dis.dis(say_hello)
  2           0 LOAD_CONST               1 ('hello world')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
######

It's a lot more lines for doing something simple, but that's exactly why
we write programs in higher-level languages now.


Anyway, if we know what the underlying low-level language looks like we
can take something written in a higher level language, and manually
transcribe those high-level constructs into the lower-level language.

So the base case is ourselves and the hardware.  *grin*

This is usually very tedious work, though, so many metacircular
implementations will use a limited subset of the language, one that is
really easy to translate down to primitive actions.


That's what the PyPy folks are doing: they're writing Python in Python,
with the caveat that the kind of Python that they're writing is a
"restricted" one that uses less features than full-on Python:

http://codespeak.net/svn/pypy/dist/pypy/documentation/coding-guide.txt

They're doing this to make the translation to the lower machine language
easier: I think their plan is to take their PyPy interpreter, and
mechanically translate it down into C.  This implies that they're planning
to use tools to do the translation, but in a pinch, they could always use
themselves and do the translation by hand.  Such a process would be very
bug prone and painful, but it's possible. *grin*



> > 2) Much harder, so please pass unless you've a link you know of
> > off-hand: any recommendations of particular things that I could read?

The classic textbook "Structure and Interpretation of Computer Programs"
might interest you:

    http://mitpress.mit.edu/sicp/

Chapter 4 goes into writing a Scheme interpreter in Scheme, and the
chapter right after that talks about how to take that interpreter and
mechanically rewrite it into low-level instructions.

Best of wishes!



More information about the Tutor mailing list