Getting references to objects without incrementing reference counters

Diez B. Roggisch deets at web.de
Sun Nov 14 19:03:02 EST 2010


Artur Siekielski <artur.siekielski at gmail.com> writes:

> Hi.
> I'm using CPython 2.7 and Linux. In order to make parallel
> computations on a large list of objects I want to use multiple
> processes (by using multiprocessing module). In the first step I fill
> the list with objects and then I fork() my worker processes that do
> the job.
>
> This should work optimally in the aspect of memory usage because Linux
> implements copy-on-write in forked processes. So I should have only
> one physical list of objects (the worker processes don't change the
> objects on the list). The problem is that after a short time children
> processes are using more and more memory (they don't create new
> objects - they only read objects from the list and write computation
> result to the database).
>
> After investigation I concluded the source of this must be
> incrementing of a reference counter when getting an object from the
> list. It changes only one int but OS must copy the whole memory page
> to the child process. I reimplemented the function for getting the
> element (from the file listobject.c) but omitting the PY_INCREF call
> and it solved my problems with increasing memory.
>
> The questions is: are there any better ways to have a real read-only
> list (in terms of memory representation of objects)? My solution is of
> course not safe. I thought about weakrefs but it seems they cannot be
> used here because getting a real reference from a weakref increases a
> reference counter. Maybe another option would be to store reference
> counters not in objects, but in a separate array to minimize number of
> memory pages they occupy...

You don't say what data you share, and if all of it is needed for each
child. So it's hard to suggest optimizations. And AFAIK there is no
built-in way of doing what you want. It's complex and error-prone.

Maybe mmap + (struct|pickle) help, if what you need can be formulated in a way
that traversing the whole data piecewise by explicitly
marshaling-demarshaling data?

Diez



More information about the Python-list mailing list