Tips or strategies to understanding how CPython works under the hood

Chris Angelico rosuav at gmail.com
Tue Jan 9 11:18:22 EST 2018


On Wed, Jan 10, 2018 at 2:21 AM, Robert O'Shea
<robertoshea2k11 at gmail.com> wrote:
> Hey all,
>
> Been subscribed to this thread for a while but haven't contributed much.
> One of my ultimate goals this year is to get under the hood of CPython and
> get a decent understanding of mechanics Guido and the rest of you wonderful
> people have designed and implemented.
>
> I've been programming in python for nearly 10 years now and while I can
> write a mean Python script, I've been becoming more and more interested in
> low level operations or complex C programs so I thought I could spread my
> love of both to make a difference for me and others.
>
> So besides just grabbing a chunk of CPython source code and digesting it, I
> was wondering if those of you have read and understood the source code, do
> you have any tips or good starting points?

Cool! Let's see.

The first thing I'd do is to explore the CPython byte code. Use the
'dis' module to examine the compiled version of a function, and then
look at the source code for dis.py (and the things it imports, like
opcode.py) to get a handle on what's happening in that byte code.
CPython is a stack-based interpreter, which means it loads values onto
an (invisible) internal stack, processes values at the top of the
stack, and removes them when it's done.

Once you've gotten a handle on the bytecode, I'd next dive into one
particular core data type. Pick one of dict, list, str (note that, for
hysterical raisins, it's called "unicode" in the source), int (for
similar hysterical raisins, it's called "long"), etc. In the CPython
source code, there's an Objects/ directory, Explore the functionality
of that one object type, keeping in mind the interpreter's stack. Get
an understanding of the different things you can do with it at the low
level; some of them will be the same as you're used to from the high
level, but some won't (for instance, Python code is never aware of a
call to list_resize). Especially, read all the comments; the top few
pages of dictobject.c are pretty much entirely English, and give a lot
of insight into how Python avoids potential pitfalls in dict
behaviour.

>From there, it's all up to you! Don't hesitate to ask questions about
stuff you see. Tinkering is strongly encouraged!

Oh, one thing to keep an eye on is error handling. You might discover
something because there's code to raise an exception if something
happens... like raising ValueError("generator already executing"),
which I found highly amusing. (I cannot imagine ANY sane code that
would ever trigger that error!)

Have fun with it!

ChrisA
not a CPython source code expert by any means, but has followed the
above steps and greatly enjoyed the process



More information about the Python-list mailing list