2.6, 3.0, and truly independent intepreters

Tue Oct 28 05:05:59 EDT 2008

On Oct 26, 6:57 pm, "Andy O'Meara" <and... at gmail.com> wrote:
> Grrr... I posted a ton of lengthy replies to you and other recent
> posts here using Google and none of them made it, argh. Poof. There's
> nothing that fires more up more than lost work,  so I'll have to
> revert short and simple answers for the time being.  Argh, damn.
>
> On Oct 25, 1:26 am, greg <g... at cosc.canterbury.ac.nz> wrote:
>
>
>
> > Andy O'Meara wrote:
> > > I would definitely agree if there was a context (i.e. environment)
> > > object passed around then perhaps we'd have the best of all worlds.
>
> > Moreover, I think this is probably the *only* way that
> > totally independent interpreters could be realized.
>
> > Converting the whole C API to use this strategy would be
> > a very big project. Also, on the face of it, it seems like
> > it would render all existing C extension code obsolete,
> > although it might be possible to do something clever with
> > macros to create a compatibility layer.
>
> > Another thing to consider is that passing all these extra
> > pointers around everywhere is bound to have some effect
> > on performance.
>
> I'm with you on all counts, so no disagreement there.  On the "passing
> a ptr everywhere" issue, perhaps one idea is that all objects could
> have an additionalfieldthat would point back to their parent context
> (ie. their interpreter).  So the only prototypes that would have to be
> modified to contain the context ptr would be the ones that don't
> inherently operate on objects (e.g. importing a module).

Trying to directly share objects like this is going to create
contention.  The refcounting becomes the sequential portion of
Amdahl's Law.  This is why safethread doesn't scale very well: I share
a massive amount of objects.

An alternative, actually simpler, is to create proxies to your real
object.  The proxy object has a pointer to the real object and the
context containing it.  When you call a method it serializes the
arguments, acquires the target context's GIL (while releasing yours),
and deserializes in the target context.  Once the method returns it
reverses the process.

There's two reasons why this may perform well for you: First,
operations done purely in C may cheat (if so designed).  A copy from
one memory buffer to another memory buffer may be given two proxies as
arguments, but then operate directly on the target objects (ie without
serialization).

Second, if a target context is idle you can enter it (acquiring its
GIL) without any context switch.

Of course that scenario is full of "maybes", which is why I have
little interest in it..

An even better scenario is if your memory buffer's methods are in pure
C and it's a simple object (no pointers).  You can stick the memory
buffer in shared memory and have multiple processes manipulate it from
C.  More "maybes".

An evil trick if you need pointers, but control the allocation, is to
take advantage of the fork model.  Have a master process create a
bunch of blank files (temp files if linux doesn't allow /dev/zero),
mmap them all using MAP_SHARED, then fork and utilize.  The addresses
will be inherited from the master process, so any pointers within them
will be usable across all processes.  If you ever want to return
memory to the system you can close that file, then have all processes
use MAP_SHARED|MAP_FIXED to overwrite it.  Evil, but should be
disturbingly effective, and still doesn't require modifying CPython.