[Python-Dev] GIL removal question

Fri Aug 12 21:17:59 CEST 2011

Even in the Erlang model, the afore-mentioned issues of bus contention put a
cap on the number of threads you can run in any given application assuming
there's any amount of cross-thread synchronization. I wrote a blog post on
this subject with respect to my experience in tuning RabbitMQ on NUMA
architectures.

http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/

It should be noted that Erlang processes are not the same as OS processes.
They are more akin to green threads, scheduled on N number of legit OS
threads which are in turn run on C number of cores. The end effect is the
same though, as the data is effectively shared across NUMA nodes, which runs
into basic physical constraints.

I used to think the GIL was a major bottleneck, and though I'm not fond of
it, my recent experience has highlighted that *any* application which uses
shared memory will have significant bus contention when scaling across all
cores. The best course of action is shared-nothing MPI style, but in 64bit
land, that can mean significant wasted address space.

<http://blog.agoragames.com/blog/2011/06/24/of-penguins-rabbits-and-buses/>
-Aaron

On Fri, Aug 12, 2011 at 2:59 PM, Sturla Molden <sturla at molden.no> wrote:

> Den 12.08.2011 18:51, skrev Xavier Morel:
>
>> * Erlang uses "erlang processes", which are very cheap preempted
>> *processes* (no shared memory). There have always been tens to thousands to
>> millions of erlang processes per interpreter source contention within the
>> interpreter going back to pre-SMP by setting the number of schedulers per
>> node to 1 can yield increased overall performances)
>>
>
> Technically, one can make threads behave like processes if they don't share
> memory pages (though they will still share address space). Erlangs use of
> 'process' instead of 'thread' does not mean an Erlang process has to be
> implemented as an OS process. With one interpreter per thread, and a malloc
> that does not let threads share memory pages (one heap per thread), Python
> could do the same.
>
> On Windows, there is an API function called HeapAlloc, which lets us
> allocate memory form a dedicated heap. The common use case is to prevent
> threads from sharing memory, thus behaving like light-weight processes
> (except address space is shared). On Unix, is is more common to use fork()
> to create new processes instead, as processes are more light-weight than on
> Windows.
>
> Sturla
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/a8fa96c3/attachment.html>