Python in C

Mon Dec 29 21:32:45 EST 2008

    thmpsn> 1. Can anyone explain to me what kind of program structuring
    thmpsn>    technique (which paradigm, etc) CPython uses? How do modules
    thmpsn>    interact together?  What conventions does it use?

it's quite object-oriented once you understand how things are done.  Take a
look, for example, at the implementation of floating point numbers:

    .../Objects/floatobject.c
    .../Include/floatobject.h

BTW, as a person who hasn't really written a stitch of C++ in about 10 years
I personally find the CPython implementation to be one of the most
well-organized large pieces of code I have ever encountered.  It's much
easier to read (to me) than any significant piece of C++ code I have ever
tried to read.

Here are a few things which might help you understand the code structure a
bit more:

    * The Python parser is generated from a specification.  Look in
      .../Parser and .../Grammar/Grammar.

    * The code for most objects generally has a single entry point.  For
      floating point objects it's _PyFloat_Init.  The leading underscore
      tells the world "we needed to export this symbol, but keep your hands
      off it".  Objects like floats and ints tend to have several other
      exported functions (search for "Py" at the beginning of a line) which
      are used by module writers.  Objects implemented as extension modules
      (look in .../Modules/*.c) have a single (static) entry point,
      init_<mod>.  The runtime dlopen's the so/dll file and calls that
      function.

    * Since C doesn't offer transparent method tables it's explicit in an
      object's code.  Referring again to the float object code, look for the
      type specifier ("PyFloat_Type") and the method "dict"
      ("float_methods").  Note the comments at the end of the lines defining
      PyFloat_Type.  They describe the use of each slot.  Float objects
      aren't sequences so the tp_as_sequence is NULL.  Similarly, objects
      such as lists which don't implement numeric methods a NULL
      tp_as_number slot.

    * The byte code compiler is in Python/compile.c.

    * The runtime interpreter is in Python/ceval.c.

    * Nothing requires a module to implement classes or types.  Look at
      Python/sysmodule.c and Modules/mathmodule.c for two examples.  Both
      modules export many functions but define no new types.

Part of the complexity you might be stumbling on is due to the fact that
Python is a mature application and so many parts of the implementation have
been fine-tuned.  Python "eats its own dog food", so for example the dict
implementation you use in your scripts is also used by the runtime virtual
machine to implement namespaces of all sorts (instance, class and module
dicts, for example).  It has been heavily optimized.  Take a look at
Objects/dictnotes.txt.  Also, observations about the data use of many Python
programs (and the C runtime itself) have lead to a number of optimizations
such as the int and float free lists and the custom object allocator
implemented in Objects/obmalloc.c.

    thmpsn> 2. Have there been any suggestions in the past to rewrite
    thmpsn>    Python's mainstream implementation in C++ (or why wasn't it
    thmpsn>    done this way from the beginning)?

C++ was far from widely enough available when Python was first written in
the late-80s/early-90s.  Today there is no particular reason to rewrite it.
If you want to incorporate externally written C++ code into Python you can
do that either manually or using tools such as SWIG, SIP or Boost.Python.

HTH,

-- 
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/