Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

Sat Nov 4 08:42:17 EST 2006

Paul Rubin wrote:
> robert <no-spam at no-spam-no-spam.invalid> writes:
>>> I don't want to discourage you but what about reference
>>> counting/memory
>>> management for shared objects? Doesn't seem fun for me.
>> in combination with some simple locking (anyway necessary) I don't
>> see a problem in ref-counting.
>> If at least any interpreter branch has a pointer to the (root)
>> object in question the ref-count is >0. ----
>> Question Besides: do concurrent INC/DEC machine OP-commands execute
>> atomically on Multi-Cores as they do in Single-Core threads?
> 
> Generally speaking, no, the inc/dec instructions are not atomic.  You
> can do an atomic increment on the x86 using LOCK XCHG (or maybe LOCK
> INC is possible).  The thing is that the locking protocol that
> guarantees atomicity is very expensive, like 100x as expensive as an
> unlocked instruction on a big multiprocessor.  So yes, of course you
> could accomplish reference counting through locks around the ref
> counts, but performance suffers terribly.  The solution is to get rid
> of the ref counts and manage the entire heap using garbage collection.
> 
> For stuff like dictionary access, there are protocols (again based on
> LOCK XCHG) that don't require locking for lookups.  Only updates
> require locking.  Simon Peyton-Jones has a good paper about how it's
> done in Concurrent Haskell:
> 
>   http://research.microsoft.com/~simonpj/papers/stm/stm.pdf
> 
> This is really cool stuff and has found its way into Perl 6.  I'd like
> to see Python get something like it.

Thats really interesting. Do expect this to remove once the GIL from Python? As dict-accesses (which are also must-be-atoms here) compose a major Python CPU load, the 100x costing instructions would probably put a >30% burden on Pythons overall speed.

A lock-protected possibilty to use multiple well separated Interpreters by tunneling objects will probably still be a most effective solution without speed costs.
The problem of singleton object's refcount (None, "", 1,2,3...), which MvL mentioned, is the main concern as far as I've understood. 
The locking of Python core global resources (files etc.) can be done with litte effort.
The globals of extension modules are not really critical, as in the typical applications the tunnel method is mainly used for limited number crunching and as programmer you are well aware of multi-interpreter-unfit modules (until they final become mature). There could be also a special import method to duplicate extension data as workaround.

The singletons refcount could be fix-reset to MAXINT/2 at refcreation-time (or GC or ..) or so to freeze them quiet for ever.

(Mutable (why?)) exception types could be doubled in same style as normal python modules, or they could be rendered read-only. Thrown Exceptions will not cross the border. ( Think, the fact that "Exception.x=5" is possible is more an artefact than intended )

robert