Guide to the python interp. source?

Fri Jul 26 10:36:58 EDT 2002

Tim Gahnström /Bladerman wrote:

> Hi
> <<Essence of question>>
> Is there a guide or a recomended way to "get to know" the source code to
> the python interpreter?

Assuming you're familiar with Python's C API (the online docs are good,
and you can complement them with e.g. my tutorial on the Europython site,
my articles in the "Py" zine, examples on the Cookbook, Bradley's
appendix in his "Essential Reference" book, etc, etc), it's not that
big a deal.  Counting in each case empty and comment lines too, 90k
lines are in Modules/*.c -- 90 or so modules with a median of about
600 lines, mostly nice and easy; 39k lines in Objects/*.c, 30 or so
objects with a median of about 740 lines, nothing terrible; 3600 in
Parser/*c, admittedly nastier but the modules ARE small, median is
153 lines; 26K lines in Python/*.c (net of the dynload* stuff),
42 modules of median 150 lines.

There are a few tough places, some of which (e.g. the RE module) you
can probably skip, some (Python/compile.c, Python/ceval.c ...) not.

Once you ARE familiar with the C API, and thus with the object
implementations that after all do follow it, I suggest that for
the rest of the core you take a "flow" approach -- how is Python
source transformed into execution.  I.e., compilation into bytecode,
and interpretation of bytecode.  Those are nasty, particularly (as
of 2.2.1) the compilation.  Then you can look at other more
marginal nasties such as garbage collection &c.

> <<Some helping background>>
> I am creating a new language and an IDE intended for beginners. This is my
> CS master thesis. I plan to use Python as primary language and the Python
> interpreter as my interpreter. I will probably need to make quite a few
> changes to the the interpreter to make the language behave the way I want,
> and I will need to monitor the state of the interperter for debugging
> purposes

Instrumenting the interpreter to understand what exactly is
going on is not hard and does not require complete understanding,
indeed doing and examining such instrumentation will HELP you
further your understanding.

Modification is of course another kettle of fish:-).

> <<Some really uininteresting background>>
> Python is, I think, a verry intuitive language for beginners, with some
> modifications it can be even better. Especially with a good IDE. That is

Two mostly separate issues, I think -- the really good IDE is pretty
uncontroversial (if really good:-), the mods WILL get you flames:-).

> what I have set out to create. I have designed the language I want to
> creat and I have made the first draft of the IDE using Tkinter but I have
> a big problem with the python source. It is quite extensive and I am not
> one of those people that can have a look at a million lines of code and se

It's nowhere like a million lines!  264k overall including ALL the
platform-dependent goo you don't AT ALL care about.  Given that
Modules and Objects are separable, modular and clear, the hard
core is just 30k lines or so in Python/*.c and Parser/*.c.

> Things I want to change is for example, everything should be "call by
> refferense", it shall not by case sensitive, redirect output, better
> errormesages, etc, etc.

Better error messages would of course be a welcome contribution
to Python itself:-).  "redirect output" is a mystery -- Python's
own output is highly redirectable, what more is needed?  Case
insensitivity is what gets my personal applause: it's the ONE
feature of the language I truly, deeply dislike (just as I
dislike it in Unix, XML, C, ...).  You're in for substantial
work with the *libraries*, I think (not the C-coded ones --
the Python-coded ones).

"Call by reference" I gotta see.  Just for fun, do it like
good old Fortran used to:

def inc(x):
    x = x + 1

inc(2)
print 2

should of course print '3'.

Anyway, I think that to implement call by reference you
will have to touch just about every one of 300 source files.

I suggest you consider "value/copy-return" as an alternative:
newbies can't tell the difference (except for the performance
hit, but in Python, as you'd be copying references, that
should be slight) and you could handle it with 1/10th the
hassle of pass-by-reference.  Basically, you can widen the
information in a frame about the arguments, to record the
'sources, thus destinations' of each argument (that's the
hard part of course, since you need to make some tough
decisions, but still, doable); when you're about to
dismantle the frame on exit, copy back the changed value
of arguments (selectively, so you don't change the "2",
but that's easy).

The hard part comes in determining what you want to
happen on e.g.:

        f(x, x.y, x.y.z, x.y.z.t)

when f changes its 2nd argument, should that in turn
change the 3rd and 4th appropriately and implicitly too?
What if some of attributes y, z and t are obtained by
__getattr__ or its equivalent?  And similar questions
for items vs attributes.

Algol 60 specified "call by name" -- easiest to explain,
hellish to implement when you consider such issues as
items, e.g.
        i = 0
        print f("ciao"[i], i)
with
def f(c, i)
    i = i + 1
    return c
should print "i" according to call-by-name semantics.
Call by value/copy-return is much easier to implement.
Call by reference is a sort of compromise that's more
or less inevitable for performance reasons when a
language has value (rather than reference) semantics,
and indeed that's what Fortran mostly did -- but
having reference semantics trying to perch call by
reference on top of it seems really precarious.

Alex