[Python-ideas] Copy-on-write when forking a python process

Wed Apr 13 15:43:16 CEST 2011

On 4/12/11, jac <john.theman.connor at gmail.com> wrote:
> ... an object's
> reference count is stored in the "ob_refcnt" field of the PyObject
> structure itself.  When a process forks, its memory is initially not
> copied. However, if any references to an object are made or destroyed
> in the child process, the page in which the objects "ob_refcnt" field
> is located in will be copied.

This also causes some problems in a single process attempting to run
on multiple cores, because that change invalidates the cache.

> My first thought was the obvious one: make the ob_refcnt field a
> pointer into an array of all object refcounts stored elsewhere.

Good thought, and probably needed for some types of parallelism.

The problem is that it also means that actually using the object will
require loading from at least two memory areas -- one to update the
reference count, the other for the object itself, which may or may not
be changed.  For relatively small objects, you would effectively be
cutting your cache size in half, in addition to the new calculations.
It takes a lot of benefit for that to pay back, and it may be simpler
to just go with PyPy and an alternate memory management scheme.

-jJ