2.6, 3.0, and truly independent intepreters

Fri Oct 24 12:30:18 EDT 2008

On Fri, Oct 24, 2008 at 10:40 AM, Andy O'Meara <andy55 at gmail.com> wrote:
>> > 2) Barriers to "free threading".  As Jesse describes, this is simply
>> > just the GIL being in place, but of course it's there for a reason.
>> > It's there because (1) doesn't hold and there was never any specs/
>> > guidance put forward about what should and shouldn't be done in multi-
>> > threaded apps
>>
>> No, it's there because it's necessary for acceptable performance
>> when multiple threads are running in one interpreter. Independent
>> interpreters wouldn't mean the absence of a GIL; it would only
>> mean each interpreter having its own GIL.
>>
>
> I see what you're saying, but let's note that what you're talking
> about at this point is an interpreter containing protection from the
> client level violating (supposed) direction put forth in python
> multithreaded guidelines.  Glenn Linderman's post really gets at
> what's at hand here.  It's really important to consider that it's not
> a given that python (or any framework) has to be designed against
> hazardous use.  Again, I refer you to the diagrams and guidelines in
> the QuickTime API:
>
> http://developer.apple.com/technotes/tn/tn2125.html
>
> They tell you point-blank what you can and can't do, and it's that's
> simple.  Their engineers can then simply create the implementation
> around those specs and not weigh any of the implementation down with
> sync mechanisms.  I'm in the camp that simplicity and convention wins
> the day when it comes to an API.  It's safe to say that software
> engineers expect and assume that a thread that doesn't have contact
> with other threads (except for explicit, controlled message/object
> passing) will run unhindered and safely, so I raise an eyebrow at the
> GIL (or any internal "helper" sync stuff) holding up an thread's
> performance when the app is designed to not need lower-level global
> locks.
>
> Anyway, let's talk about solutions.  My company looking to support
> python dev community endeavor that allows the following:
>
> - an app makes N worker threads (using the OS)
>
> - each worker thread makes its own interpreter, pops scripts off a
> work queue, and manages exporting (and then importing) result data to
> other parts of the app.  Generally, we're talking about CPU-bound work
> here.
>
> - each interpreter has the essentials (e.g. math support, string
> support, re support, and so on -- I realize this is open-ended, but
> work with me here).
>
> Let's guesstimate about what kind of work we're talking about here and
> if this is even in the realm of possibility.  If we find that it *is*
> possible, let's figure out what level of work we're talking about.
> >From there, I can get serious about writing up a PEP/spec, paid
> support, and so on.

Point of order! Just for my own sanity if anything :) I think some
minor clarifications are in order.

What are "threads" within Python:

Python has built in support for POSIX light weight threads. This is
what most people are talking about when they see, hear and say
"threads" - they mean Posix Pthreads
(http://en.wikipedia.org/wiki/POSIX_Threads) this is not what you
(Adam) seem to be asking for. PThreads are attractive due to the fact
they exist within a single interpreter, can share memory all "willy
nilly", etc.

Python does in fact, use OS-Level pthreads when you request multiple threads.

The Global Interpreter Lock is fundamentally designed to make the
interpreter easier to maintain and safer: Developers do not need to
worry about other code stepping on their namespace. This makes things
thread-safe, inasmuch as having multiple PThreads within the same
interpreter space modifying global state and variable at once is,
well, bad. A c-level module, on the other hand, can sidestep/release
the GIL at will, and go on it's merry way and process away.

POSIX Threads/pthreads/threads as we get from Java, allow unsafe
programming styles. These programming styles are of the "shared
everything deadlock lol" kind. The GIL *partially* protects against
some of the pitfalls. You do not seem to be asking for pthreads :)

http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock
http://en.wikipedia.org/wiki/Multi-threading

However, then there are processes.

The difference between threads and processes is that they do *not
share memory* but they can share state via shared queues/pipes/message
passing - what you seem to be asking for - is the ability to
completely fork independent Python interpreters, with their own
namespace and coordinate work via a shared queue accessed with pipes
or some other communications mechanism. Correct?

Multiprocessing, as it exists within python 2.6 today actually forks
(see trunk/Lib/multiprocessing/forking.py) a completely independent
interpreter per process created and then construct pipes to
inter-communicate, and queue to do work coordination. I am not
suggesting this is good for you - I'm trying to get to exactly what
you're asking for.

Fundamentally, allowing total free-threading with Posix threads, using
the same Java-Model for control is a recipe for pain - we're just
repeating mistakes instead of solving a problem, ergo - Adam Olsen's
work. Monitors, Actors, etc have all been discussed, proposed and are
being worked on.

So, just to clarify - Andy, do you want one interpreter, $N threads
(e.g. PThreads) or the ability to fork multiple "heavyweight"
processes?

Other bits for reading:
http://www.boddie.org.uk/python/pprocess.html (as an alternative the
multiprocessing)
http://smparkes.net/tag/dramatis/
http://osl.cs.uiuc.edu/parley/
http://candygram.sourceforge.net/