How hard to allocate ALL Python data space (hence offset-based) in a memory mapped file image?

Bengt Richter bokr at oz.net
Thu Dec 19 01:51:08 EST 2002


Not that it already isn't in a memory mapped file or the equivalent as a
byproduct of OS doings, but that's for swapping etc. I mean a file under
control of the Python interpreter, to span all state-containing memory.
(I don't mean a Python mmap per se, but an internal-to-the-interpreter
use of the same mechanisms).

I just thought of this while reading a thread on CGI etc.
IOW, think of id(x) returning an offset into the mmap file image
instead of a memory location.

The idea would be to support checkpointing interpreter instances
for resumption. If this flew, it could also support creation of
an initialized instance by just copying a file image of an interpreter
instance checkpointed right after initialization is complete[1].

This might also be a way to distribute apps. Minimal interpreter plus
app state image file, with all python dependencies loaded as much as
you want up to the checkpoint call. And special C stuff and DLLs could
also be part of the image as data!!

[1] I.e., if Python initialization code (here anything before calling
sys.checkpoint) read the relevant binary files into "memory" as strings
before calling e.g., sys.checkpoint('mydist.pyk'), they would be part
of the image, and could be written back out in binary on continuation
from the checkpoint call, if the DLL files etc. were found not to exist.

Checkpointing using a call to sys.checkpoint('mydist.pyk') would create
the file (without argument, it would just flush the current image back to
its file (which must have been started with -k option or have had the new
file specified as a second file arg to -K, otherwise it's a programming
error or leaves an unrenamed temp file)). The file would include whatever
state necessary to come back to the caller through the last sys.checkpoint()
call when relaunched via e.g.,

    python -k mydist.pyk

which would magically continue after the sys.checkpoint call and mydist.pyk
would be the relevant mapped file ;-)

A normal python start without -k would presumably use a copy-on-write virtual
temp copy of a default python.pyk file if available, for a quick start.

To make use of this explicitly,  you could use an upper case -K option, e.g.,
to launch a fresh nonresuming cgi based on an image with e.g., python -K cgiapp.pyk
which would similarly launch with a virtual clone file (or actual depending on OS).

As mentioned, if you wanted to launch several stateful cgis with the intent of
having them resumable, you would use python -K cgiapp.pyk cgiappNNN.pyk to have
numbered copies that could suspend themselves with sys.checkpoint(), and be
resumed with python -k cgiappNNN.pyk (lower case -k to resume). Of course, exiting
the interpreter, which a checkpoint call would do, implies that i/o should be closed
and reopened on resumption. stdin/out/err might be able to be handled fairly transparently,
but other resources would need specific care.

This whole thing also raises the interesting idea of being able to load a system
from ROM in two easily defined pieces: 1) The minimal intepreter code and 2) a
state image file. This should help fast startup for small devices etc.

"Addresses" would still be "absolute" with respect to each other in
the (single) image, so references in lists etc would work much as now.
(I.e., offsets would all be from the same zero base).

If we were lucky, most of the work could be accomplished with a few macro
changes. Dreaming on ... ;-)

I think it would be COOL to be able to do these things.
For windows, I suspect it would not be that hard to wrap those two components in
a self-launching zip archive .exe. Which might be similar to what freeze does?

But in any case, I can't think of an easier way to wrap things than just
capturing memory at the time of a checkpoint call. I.e., pretty much just store
the last state info in an object and close the file (and sometimes rename it).
COOL or what? ;-)

With lots of memory in a server, python state images would presumably tend to stay
in memory, and even intitializing a fresh image should go as fast as a memory-memory
copy of the -K-specified file. And everything should be ready to go without all the
importing and fiddling that was accomplished before the checkpoint.

Regards,
Bengt Richter



More information about the Python-list mailing list