[Tutor] When you write the script how is it interpreted?
[parsing/compiling]
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Wed, 30 Jan 2002 17:01:01 -0800 (PST)
On Wed, 30 Jan 2002, Remco Gerlich wrote:
> On 0, "McCarney, James Alexander" <James.Alexander.McCarney@Cognicase.com> wrote:
> > This is something I have wondered. When I write a script and Py interprets
> > it, how does Py compile it?
> > Could anyone direct me to some FAQs or really high-level dirt on the matter.
> > TiA.
>
> How do you mean?
>
> It parses the code,
Parsing takes chunks of the program text, and figures out the
relationships between all the pieces.
As an analogy, we can form a "tree" out of a sentence, like "I am
Tarzan!":
sentence
+------+-------+
/ | \
/ | \
subject verb object
| | |
I am Tarzan!
A large part of understanding a language is figuring out the structure and
relationships between the words. Likewise, a Python parser needs to do
the same sort of tree-building with the Python grammar as one of the first
steps in generating bytecode.
If we're interested in this sort of stuff, we can take a look at:
http://www.python.org/doc/current/ref/ref.html
where the docs explain Python grammatical structure in more detail.
There's also a module called 'parser' that will do the tree building for
us:
http://www.python.org/doc/lib/module-parser.html
Parsing is an involving topic, but it's been so well developed that we can
use parsing tools without really having to worry about how these tools
actually work. *grin* It's quite cool that parsing technology is very
much related to the work of linguists like Noam Chomsky.
Anyway, once Python parses source code into a tree, it now has it in a
form that's more digestible. Now it can reduce it into even smaller
chunks called "bytecodes". We can see an example of what bytecode looks
like by using the 'dis' module:
###
>>> import dis
>>> def gcd(a, b):
... if b == 0: return a
... return gcd(b, a % b)
...
>>> dis.dis(gcd)
0 SET_LINENO 1
3 SET_LINENO 2
6 LOAD_FAST 1 (b)
9 LOAD_CONST 1 (0)
12 COMPARE_OP 2 (==)
15 JUMP_IF_FALSE 11 (to 29)
18 POP_TOP
[... rest of the decompiled code cut for brevity.]
###
The compiler probably does a case-by-case analysis of a program when it
translates a parse tree into these elementary bytecodes --- if it's
processing an 'if', then it needs to set up a way of jumping around code
if the test returns false. Take a look at:
http://www.python.org/doc/lib/module-dis.html
for a list of the primitive bytecodes that Python uses. It's a surprising
fact that the high level concepts that we work with in Python can be
reduced to such simple operations as LOADing and JUMPing.
> then compiles it to Python bytecode. The bytecodes are run by the
> Python interpreter (they're stored in the .pyc file so the same .py
> file only has to be compiled once if it doesn't change). I'm not sure
> what part of this you need high level information on. Basically this
> is the highest level view, as a next step you could look at the C
> source :)
Python 2.2 now comes with a 'compiler' module that implements the Python
bytecode compiler in Python:
http://www.python.org/doc/lib/compiler.html
(I'm actually trying to learn how the compiler works myself; I'd love to
do a project that compiles Scheme into Python bytecodes just for the
perversity of it all. *grin*)