Copy-on-write when forking a python process

jac john.theman.connor at gmail.com
Fri Apr 8 14:34:18 EDT 2011


Hi Heiko,
I just realized I should probably have put a clearer use-case in my
previous message.  A example use-case would be if you have a parent
process which creates a large dictionary (say several gigabytes).  The
process then forks several worker processes which access this
dictionary.  The worker processes do not add or remove objects from
the dictionary, nor do they alter the individual elements of the
dictionary.  They simply perform lookups on the dictionary and perform
calculations which are then written to files.
If I wrote the above program in C, neither the "dictionary" nor its
contents would be copied into the memory of the child processes, but
in python as soon as you pass the dictionary itself or any of its
contents into a function as an argument, its reference count is
changed and the page of memory on which its reference count resides is
copied into the child process' memory.  What I am proposing is to
allow the parent process to disable reference counting for this
dictionary and its contents so that the child processes can access
them in a readonly fashion without them having to be copied.

I disagree with your statement that COW is an optimization for a
complete clone, it is an optimization that works at the memory page
level, not at the memory image level.  In other words, if I write to a
copy-on-write page, only that page is copied into my process' address
space, not the entire parent image.  To the best of my knowledge by
preventing the child process from altering an object's reference count
you can prevent the object from being copied (assuming the object is
not altered explicitly of course.)

Hopefully this clarifies my previous post,
--jac

On Apr 8, 12:26 pm, Heiko Wundram <modeln... at modelnine.org> wrote:
> Am 08.04.2011 18:14, schrieb John Connor:
>
> > Has anyone else looked into the COW problem?  Are there workarounds
> > and/or other plans to fix it?  Does the solution I am proposing sound
> > reasonable, or does it seem like overkill?  Does anyone foresee any
> > problems with it?
>
> Why'd you need a "fix" like this for something that isn't broken? COW
> doesn't just refer to the object reference-count, but to the object
> itself, too. _All_ memory of the parent (and, as such, all objects, too)
> become unrelated to memory in the child once the fork is complete.
>
> The initial object reference-count state of the child is guaranteed to
> be sound for all objects (because the parent's final reference-count
> state was, before the process image got cloned [remember, COW is just an
> optimization for a complete clone, and it's up the operating-system to
> make sure that you don't notice different semantics from a complete
> copy]), and what you're proposing (opting in/out of reference counting)
> breaks that.
>
> --
> --- Heiko.




More information about the Python-list mailing list