2.6, 3.0, and truly independent intepreters

Wed Oct 29 03:27:01 EDT 2008

Wow, man. Excellent post. You want a job?

The gui could use PyA threads for sure, and the audio thread could use
PyC threads. It would not be a problem to limit the audio thread to
only reentrant libraries.

This kind of thought is what I had in mind about finding a compromise,
especially in the way that PyD would not break old code assuming that
it could eventually be ported.

On Fri, Oct 24, 2008 at 11:02 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 10/24/2008 8:42 AM, came the following characters from the
> keyboard of Andy O'Meara:
>>
>> Glenn, great post and points!
>>
>
> Thanks. I need to admit here that while I've got a fair bit of professional
> programming experience, I'm quite new to Python -- I've not learned its
> internals, nor even the full extent of its rich library. So I have some
> questions that are partly about the goals of the applications being
> discussed, partly about how Python is constructed, and partly about how the
> library is constructed. I'm hoping to get a better understanding of all of
> these; perhaps once a better understanding is achieved, limitations will be
> understood, and maybe solutions be achievable.
>
> Let me define some speculative Python interpreters; I think the first is
> today's Python:
>
> PyA: Has a GIL. PyA threads can run within a process; but are effectively
> serialized to the places where the GIL is obtained/released. Needs the GIL
> because that solves lots of problems with non-reentrant code (an example of
> non-reentrant code, is code that uses global (C global, or C static)
> variables – note that I'm not talking about Python vars declared global...
> they are only module global). In this model, non-reentrant code could
> include pieces of the interpreter, and/or extension modules.
>
> PyB: No GIL. PyB threads acquire/release a lock around each reference to a
> global variable (like "with" feature). Requires massive recoding of all code
> that contains global variables. Reduces performance significantly by the
> increased cost of obtaining and releasing locks.
>
> PyC: No locks. Instead, recoding is done to eliminate global variables
> (interpreter requires a state structure to be passed in). Extension modules
> that use globals are prohibited... this eliminates large portions of the
> library, or requires massive recoding. PyC threads do not share data between
> threads except by explicit interfaces.
>
> PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate global
> variables, and each interpreter instance is provided a state structure.
> There is still a GIL, however, because globals are potentially still used by
> some modules. Code is added to detect use of global variables by a module,
> or some contract is written whereby a module can be declared to be reentrant
> and global-free. PyA threads will obtain the GIL as they would today. PyC
> threads would be available to be created. PyC instances refuse to call
> non-reentrant modules, but also need not obtain the GIL... PyC threads would
> have limited module support initially, but over time, most modules can be
> migrated to be reentrant and global-free, so they can be used by PyC
> instances. Most 3rd-party libraries today are starting to care about
> reentrancy anyway, because of the popularity of threads.
>
> The assumptions here are that:
>
> Data-1) A Python interpreter doesn't provide any mechanism to share normal
> data among threads, they are independent... but message passing works.
> Data-2) A Python interpreter could be extended to provide mechanisms to
> share special data, and the data would come with an implicit lock.
> Data-3) A Python interpreter could be extended to provide unlocked access to
> special data, requiring the application to handle the synchronization
> between threads. Data of type 2 could be used to control access to data of
> type 3. This type of data could be large, or frequently referenced data, but
> only by a single thread at a time, with major handoffs to a different thread
> synchronized by the application in whatever way it chooses.
>
> Context-1) A Python interpreter would know about threads it spawns, and
> could pass in a block of context (in addition to the state structure) as a
> parameter to a new thread. That block of context would belong to the thread
> as long as it exists, and return to the spawner when the thread completes.
> An embedded interpreter would also be given a block of context (in addition
> to the state structure). This would allow application context to be created
> and passed around. Pointers to shared memory structures, might be typical
> context in the embedded case.
>
> Context-2) Embedded Python interpreters could be spawned either as PyA
> threads or PyC threads. PyC threads would be limited to modules that are
> reentrant.
>
>
> I think that PyB and PyC are the visions that people see, which argue
> against implementing independent interpreters. PyB isn't truly independent,
> because data are shared, recoding is required, and performance suffers. Ick.
> PyC requires "recoding the whole library" potentially, if it is the only
> solution. PyD allows access to the whole standard library of modules,
> exactly like today, but the existing limitations still obtain for PyA
> threads using that model – very limited concurrency. But PyC threads would
> execute in their own little environments, and not need locking. Pure Python
> code would be immediately happy there. Properly coded (reentrant,
> global-free) extensions would be happy there. Lots of work could be done
> there, to use up multi-core/multi-CPU horsepower (shared-memory
> architecture).
>
> Questions for people that know the Python internals: Is PyD possible? How
> hard? Is a PyC thread an effective way of implementing a Python sandbox? If
> it is, and if it would attract the attention of Brett Cannon, who at least
> once wanted to do a thesis on Python sandboxes, he could be a helpful
> supporter.
>
> Questions for Andy: is the type of work you want to do in independent
> threads mostly pure Python? Or with libraries that you can control to some
> extent? Are those libraries reentrant? Could they be made reentrant? How
> much of the Python standard library would need to be available in reentrant
> mode to provide useful functionality for those threads? I think you want PyC
>
> Questions for Patrick: So if you had a Python GUI using the whole standard
> library -- would it likely runs fine in PyA threads, and still be able to
> use PyC threads for the audio scripting language? Would it be a problem for
> those threads to have limited library support (only reentrant modules)?
>
>> That's the rub...  In our case, we're doing image and video
>> manipulation--stuff not good to be messaging from address space to
>> address space.  The same argument holds for numerical processing with
>> large data sets.  The workers handing back huge data sets via
>> messaging isn't very attractive.
>>
>
> In the module multiprocessing environment could you not use shared memory,
> then, for the large shared data items?
>
>> Our software runs in real time (so performance is paramount),
>> interacts with other static libraries, depends on worker threads to
>> perform real-time image manipulation, and leverages Windows and Mac OS
>> API concepts and features.  Python's performance hits have generally
>> been a huge challenge with our animators because they often have to go
>> back and massage their python code to improve execution performance.
>> So, in short, there are many reasons why we use python as a part
>> rather than a whole.
>>
>> The other area of pain that I mentioned in one of my other posts is
>> that what we ship, above all, can't be flaky.  The lack of module
>> cleanup (intended to be addressed by PEP 3121), using a duplicate copy
>> of the python dynamic lib, and namespace black magic to achieve
>> independent interpreters are all examples that have made using python
>> for us much more challenging and time-consuming then we ever
>> anticipated.
>>
>> Again, if it turns out nothing can be done about our needs (which
>> appears to be more and more like the case), I think it's important for
>> everyone here to consider the points raised here in the last week.
>> Moreover, realize that the python dev community really stands to gain
>> from making python usable as a tool (rather than a monolith).  This
>> fact alone has caused lua to *rapidly* rise in popularity with
>> software companies looking to embed a powerful, lightweight
>> interpreter in their software.
>>
>> As a python language fan an enthusiast, don't let lua win!  (I say
>> this endearingly of course--I have the utmost respect for both
>> communities and I only want to see CPython be an attractive pick when
>> a company is looking to embed a language that won't intrude upon their
>> app's design).
>>
>
> Thanks for the further explanations.
>
> --
> Glenn -- http://nevcal.com/
> ===========================
> A protocol is complete when there is nothing left to remove.
> -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>