Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

Thu Nov 9 06:43:58 EST 2006

robert wrote:
> Shane Hathaway wrote:
> > of multiple cores.  I think Python only needs a nice way to share a
> > relatively small set of objects using shared memory.  POSH goes in that
> > direction, but I don't think it's simple enough yet.
> >
> > http://poshmodule.sourceforge.net/
>
> interesting, a solution possibly a little faster than pickling - but maybe only in selected
> situations. Made already experiments with pickling through shared memory.

What did you discover in your experiments? I'd certainly suspect that
pickling is going to add an unacceptable degree of overhead in the kind
of application you're attempting to write (using a shared data
structure with random access properties), and I'll admit that I haven't
really had to deal with that kind of situation when using my own
pickle-based parallelisation solutions (which involve communicating
processes).

> With "x = posh.share(x)" an object tree will be (deep-)copied to shared mem ( as far as
> objects fullfil some conditions http://poshmodule.sourceforge.net/posh/html/node6.html: is
> this true for numpy arrays?)

My impression is that POSH isn't maintained any more and that work was
needed to make it portable, as you have observed. Some discussions did
occur on one of the Python development mailing lists about the
possibility of using shared memory together with serialisation
representations faster than pickles (which also don't need to be
managed as live objects by Python), and I think that this could be an
acceptable alternative.

> Every object to be inserted in the hot tunnel object tree has to be copied that same style.
> Thus ~pickling, but somewhat easier to use.

If my understanding of "hot tunnel object tree" is correct, you're
really wanting fast access involving mutual exclusion to some shared
data structure. At this point it becomes worth considering some kind of
distributed object technology (or even a database technology) where you
have to initialise the data structure and then communicate with an
object (or database) to perform operations on it, all for reasons I'll
explain shortly.

In your ideal situation, you say that you'd have the data structure in
the same address space as a number of threads, and each thread would be
able to perform some algorithm on the data structure, but the pattern
of accessing the structure isn't an insignificant concern even where
you can assume that the overheads are generally low: reading and
writing things might be fast in the same address space, but if the
nature of access involves lots of locking, you'll incur quite a penalty
anyway. In other words, if each read or write to the structure involves
acquiring a lock for that operation in isolation, this could
significantly diminish performance, whereas if you can guarantee that
the granularity of locking is more coarse than having to occur upon
each read or write access - that there exist some high-level operations
that require consistency within the data structure - then reasonable
performance might be maintained.

However, if the pattern of concurrent access is restricted to coarse
operations, where some entity has exclusive access to a potentially
large dataset, and where the overhead of the communication of inputs
and outputs to and from that entity is low in comparison to the cost of
performing such coarse operations, and where such operations are
themselves performed infrequently, then such a pattern coincides with
classic database or distributed object scenario. In other words, you
implement the operations acting on the data structure in a distributed
object (or as a database query or operation) and then invoke such
operations from separate processes.

I hope this makes some sense. Generally, demands for high concurrent
performance using threads often ignore other important properties such
as reliability and scalability. Certainly, threads have an important
place - classically, this involved maintaining responsiveness in
graphical user interfaces - but even then, the background threads were
often detached and not competing for the same resources as the
foreground threads servicing the event loop. The only kinds of
situation I can think of right now which might benefit from uninhibited
random access to shared data structures might be things like
simulations or large-scale multi-user games, where an appropriate data
architecture cannot be decided in advance, but even then there are
approaches which attempt to mitigate the seemingly unpredictable nature
of access to shared data.

Paul