[Python-Dev] Yet another "A better story for multi-core Python" comment

Paul Moore p.f.moore at gmail.com
Tue Sep 8 17:44:46 CEST 2015


On 8 September 2015 at 15:12, Gary Robinson <garyrob at me.com> wrote:
> So, one thing I am hoping comes out of any effort in the “A better story” direction would be a way to share large data structures between processes. Two possible solutions:
>
> 1) More the reference counts away from data structures, so copy-on-write isn’t an issue. That sounds like a lot of work — I have no idea whether it’s practical. It has been mentioned in the “A better story” discussion, but I wanted to bring it up again in the context of my specific use-case. Also, it seems worth reiterating that even though copy-on-write forking is a Unix thing, the midipix project appears to bring it to Windows as well. (http://midipix.org)
>
> 2) Have a mode where a particular data structure is not reference counted or garbage collected. The programmer would be entirely responsible for manually calling del on the structure if he wants to free that memory. I would imagine this would be controversial because Python is currently designed in a very different way. However, I see no actual risk if one were to use an @manual_memory_management decorator or some technique like that to make it very clear that the programmer is taking responsibility. I.e., in general, information sharing between subinterpreters would occur through message passing. But there would be the option of the programmer taking responsibility of memory management for a particular structure. In my case, the amount of work required for this would have been approximately zero — once the structure was created, it was needed for the lifetime of the process.

I guess a third possible solution, although it would probably have
meant developing something for yourself which would have hit the same
"programmer time is critical" issue that you noted originally, would
be to create a module that managed the data structure in shared
memory, and then use that to access the data from the multiple
processes. If your data structure is generic enough, you could make
such a module generally usable - or there may even be something
available already... I know you said that putting the data into a
database would be too slow, but how about an in-memory Sqlite database
(using shared memory so that there was only one copy for all
processes)?

Your suggestion (2), of having a non-refcounted data structure is
essentially this, doable as an extension module. The core data
structures all use refcounting, and that's unlikely to change, but
there's nothing to say that an extension module couldn't implement
fast data structures with objects allocated from a pool of
preallocated memory which is only freed as a complete block.

These suggestions are probably more suitable for python-list, though,
as (unlike your comment on non-refcounted core data structures) they
are things you can do in current versions of Python.

Paul


More information about the Python-Dev mailing list