[Python-Dev] GIL, Python 3, and MP vs. UP

Wed Sep 21 22:12:19 CEST 2005

At 12:04 PM 9/21/2005 -0700, Guido van Rossum wrote:
>Actually Python itself has a hard time keeping multiple interpreters
>truly separate. Also of course there are some shared resources
>maintained by the operating system: current directory, open file
>descriptors, signal settings, child processes, that sort of thing.
>
>If we were to completely drop this feature, we could make built-in
>classes modifyable.

I'd personally much rather we got back the ability to change the type of an 
instance of a builtin to that of a Python subclass of that builtin type, or 
to change it back.  I have more use cases for that than for actually 
modifying builtins.  (E.g. "observable" lists/dicts, hooking module 
__getattr__, etc.)

> > A system like Java's classloader would be helpfull, where the
> > classloader of a class is used to load the classes used by that
> > class. I have no idea if this can be adapted to python at all. A
> > strict coding style seems to work for now.
>
>You can do something like this using the restricted execution support,
>which works by setting the __builtins__ name in a dict where you exec
>code, and overriding __import__ in that __builtins__ dict. (I can't
>explain it too well in one paragraph, just go look up the rexec.py
>source code.)
>
>It's not great for guaranteeing there's absolutely no escape possible
>from the sandbox, but it works well enough to make accidental resource
>sharing a non-issue (apart from the OS shared resources and the
>built-in types). A misfeature (for this purpose) is that certain kinds
>of introspection are disabled (this was of course to enable restricted
>execution).

Another misfeature is that some C-level Python code expects to obtain 
sys.modules, builtins, etc. via the interpreter struct.  Thus, you tend to 
have to reimplement those things in Python to get them to respect a 
virtualization of sys.modules.  I have to admit I've only dabbled in 
attempting this, just long enough to hit a stumbling block or two and then 
discover that they were because sys.modules is in the interpreter 
struct.  Of course, my next thought then was to just expose the 
multi-interpreter API as an extension module, so that you could create 
interpreters from Python code.  The project I'd originally planned to do 
this for never materialized though, so I never actually attempted it.

My thought, though, was that by swapping the current interpreter of the 
thread state when crossing code boundaries, you could keep both the 
Python-level and C-level code happy.  However, it might also suffice to 
have a way to switch in and out the interpreter configuration (sys.modules, 
sys.__dict__, and __builtins__ at minimum; I don't have any clear use case 
for changing the three codec_* vars at the moment).

>I'd be willing to entertain improvements that improve the insulation
>this provides.

Since there's already a way to change __builtins__ in the threadstate, 
maybe the C API could be changed to obtain the six interpreter variables 
via builtins rather than the other way around.  This would allow us to drop 
the multi-interpreter API from C (along with support for restricted mode) 
but still allow complete virtualization from inside Python code.

The steps would be:

1. Remove restricted mode support
2. Change the tstate structure to have a 'builtins'
3. Change code that does tstate->interp lookups to instead lookup special 
names in the tstate's builtins

At that point, you can exec code with new builtins to bootstrap a virtual 
Python, subject to limitations like being able to load a given extension 
module only once.

Systems like mod_python that use the multi-interpreter API now would just 
need to bootstrap a new __builtins__.

Sadly, this doesn't *really* cure the GIL-ensure problem, in that you still 
don't have a specially-distinguished __builtins__ to use when you call into 
Python from a C-started thread.  On the other hand, I suspect that the use 
cases for that, and the use cases for virtualization don't overlap much, so 
having a distinguished place to hold the "default" (i.e. initial) builtins 
probably doesn't hurt virtualization much, since you can always *modify* 
that set of builtins if you need to.