CPython loading modules into memory

Wed Feb 11 17:00:24 EST 2009

> Can someone describe the details of how Python loads modules into
> memory?  I assume once the .py file is compiled to .pyc that it is
> mmap'ed in.  But that assumption is very naive.  Maybe it uses an
> anonymous mapping?  Maybe it does other special magic?  This is all
> very alien to me, so if someone could explain it in terms that a
> person who never usually worries about memory could understand, that
> would be much appreciated.

There is no magic whatsoever. Python opens a sequential file descriptor
for the .pyc file, and then reads it in small chunks, "unmarshalling"
it (indeed, the marshal module is used to restore Python objects).

The marshal format is an object serialization in a type-value encoding
(sometimes type-length-value), with type codes for:
- None, True, False
- 32-bit ints, 64-bit ints (unmarshalled into int/long)
- floats, complex
- arbitrary-sized longs
- strings, unicode
- tuples (length + marshal data of values)
- lists
- dicts
- code objects
- a few others

Result of unmarshalling is typically a code object.

> Follow up: is this process different if the modules are loaded from a
> zipfile?

No; it uncompresses into memory, and then unmarshals from there (
compressed block for compressed block)

> If there is a link that covers this info, that'd be great too.

See the description of the marshal module.

HTH,
Martin