[Tutor] When you write the script how is it interpreted? [parsing/compiling]

Wed, 30 Jan 2002 17:01:01 -0800 (PST)

On Wed, 30 Jan 2002, Remco Gerlich wrote:

> On  0, "McCarney, James Alexander" <James.Alexander.McCarney@Cognicase.com> wrote:
> > This is something I have wondered. When I write a script and Py interprets
> > it, how does Py compile it?
> > Could anyone direct me to some FAQs or really high-level dirt on the matter.
> > TiA.
> 
> How do you mean?
> 
> It parses the code,

Parsing takes chunks of the program text, and figures out the
relationships between all the pieces.

As an analogy, we can form a "tree" out of a sentence, like "I am
Tarzan!":

               sentence
           +------+-------+
          /       |        \
         /        |         \
     subject    verb      object
        |         |          |
        I        am        Tarzan!

A large part of understanding a language is figuring out the structure and
relationships between the words.  Likewise, a Python parser needs to do
the same sort of tree-building with the Python grammar as one of the first
steps in generating bytecode.

If we're interested in this sort of stuff, we can take a look at:

    http://www.python.org/doc/current/ref/ref.html

where the docs explain Python grammatical structure in more detail.  
There's also a module called 'parser' that will do the tree building for
us:

    http://www.python.org/doc/lib/module-parser.html

Parsing is an involving topic, but it's been so well developed that we can
use parsing tools without really having to worry about how these tools
actually work.  *grin* It's quite cool that parsing technology is very
much related to the work of linguists like Noam Chomsky.

Anyway, once Python parses source code into a tree, it now has it in a
form that's more digestible.  Now it can reduce it into even smaller
chunks called "bytecodes".  We can see an example of what bytecode looks
like by using the 'dis' module:

###
>>> import dis
>>> def gcd(a, b):
...    if b == 0: return a
...    return gcd(b, a % b)
... 
>>> dis.dis(gcd)
          0 SET_LINENO               1

          3 SET_LINENO               2
          6 LOAD_FAST                1 (b)
          9 LOAD_CONST               1 (0)
         12 COMPARE_OP               2 (==)
         15 JUMP_IF_FALSE           11 (to 29)
         18 POP_TOP             
[... rest of the decompiled code cut for brevity.]
###

The compiler probably does a case-by-case analysis of a program when it
translates a parse tree into these elementary bytecodes --- if it's
processing an 'if', then it needs to set up a way of jumping around code
if the test returns false.  Take a look at:

    http://www.python.org/doc/lib/module-dis.html

for a list of the primitive bytecodes that Python uses.  It's a surprising
fact that the high level concepts that we work with in Python can be
reduced to such simple operations as LOADing and JUMPing.

> then compiles it to Python bytecode. The bytecodes are run by the
> Python interpreter (they're stored in the .pyc file so the same .py
> file only has to be compiled once if it doesn't change). I'm not sure
> what part of this you need high level information on. Basically this
> is the highest level view, as a next step you could look at the C
> source :)

Python 2.2 now comes with a 'compiler' module that implements the Python
bytecode compiler in Python:

    http://www.python.org/doc/lib/compiler.html

(I'm actually trying to learn how the compiler works myself; I'd love to
do a project that compiles Scheme into Python bytecodes just for the
perversity of it all.  *grin*)