[Python-Dev] Sub-interpreters: importing numpy causes hang

Wed Jan 23 12:11:45 EST 2019

Hi Stephan,

On Tue, Jan 22, 2019 at 9:25 AM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> I am new to the list and arriving with a concrete problem that I'd
> like to fix myself.

That is great!  Statements like that are a good way to get folks
interested in your success. :)

> I am embedding Python (3.6) into my C++ application and I would like
> to run Python scripts isolated from each other using sub-interpreters.
> I am not using threads; everything is supposed to run in the
> application's main thread.

FYI, running multiple interpreters in the same (e.g. main) thread
isn't as well thought out as running them in separate threads.  There
may be assumptions in the runtime that would cause crashes or
inconsistency in the runtime, so be vigilant.  Is there a reason not
to run the subinterpreters in separate threads?

Regarding isolation, keep in mind that there are some limitations.  At
an intrinsic level subinterpreters are never truly isolated since they
run in the same process.  This matters if you have concerns about
security (which you should always consider) and stability (if a
subinterpreter crashes then your whole process crashes).  You can find
that complete isolation via subprocess & multiprocessing.

On top of intrinsic isolation, currently subinterpreters have gaps in
isolation that need fixing.  For instance, they share a lot of
module-global state, as well as builtin types and singletons.  So data
can leak between subinterpreters unexpectedly.

Finally, at the Python level subinterpreters don't have a good way to
pass data around.  (I'm working on that. [1])  Naturally at the C
level you can keep pointers to objects and share data that way.  Just
keep in mind that doing so relies on the GIL (in an
interpreter-per-thread scenario, which you're avoiding).  In a world
where subinterpreters don't share the GIL [2] (and you're running one
interpreter per thread) you'll end up with refcounting races, leading
to crashes.  Just keep that mind if you decide to switch to
one-subinterpreter-per-thread.

On Tue, Jan 22, 2019 at 8:09 PM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> Nathaniel, I'd like to allow Python plugins in my application. A
> plugin should be allowed to bring its own modules along (i.e.
> plugin-specific subdir is in sys.path when the plugin is active) and
> hence some isolation of them will be needed, so that they can use
> different versions of a given module. That's my main motivation for
> using subinterpreters.

That's an interesting approach.  Using subinterpreters would indeed
give you isolation between the sets of imported modules.

As you noticed, you'll run into some problems when extension modules
are involved.  There aren't any great workarounds yet .
Subinterpreters are tied pretty tightly to the core runtime so it's
hard to attack the problem from the outside.  Furthermore,
subinterpreters aren't widely used yet so folks haven't been very
motivated to fix the runtime.  (FWIW, that is changing.)

> I thought about running plugins out-of-processes - a separate process
> for every plugin - and allow them to communicate with my application
> via RPC. But that makes it more complex to implement the API my
> application will offer and will slow down things due to the need to
> copy data.

Yep.  It might be worth it though.  Note that running
plugins/extensions in separate processes is a fairly common approach
for a variety of solid technical reasons (e.g. security, stability).
FWIW, there are some tools available (or soon to be) for sharing data
more efficiently (e.g. shared memory in multiprocessing, PEP 574)

> Maybe you have another idea for me? :)

* single proc -- keep using subinterpreters
  + dlmopen or the Windows equivalent (I hesitate to suggest this
hack, but it might help somewhat with extension modules)
  + help fix the problems with subinterpreters :)
* single proc -- no subinterpreters
  + import hook to put plugins in their own namespace (tricky with
extension modules)
  + extend importlib to do the same
  + swap sys.modules in and out around plugin use
* multi-proc -- one process per plugin
  + subprocess
  + multiprocessing

On Wed, Jan 23, 2019 at 8:48 AM Stephan Reiter <stephan.reiter at gmail.com> wrote:
> Well, the plugins would be created by third-parties and I'd like them
> to enable bunding of modules with their plugins.
> I am afraid of modules with the same name, but being different, or
> different versions of modules being used by different plugins. If
> plugins share an interpreter, the module with a given name that is
> imported first sticks around forever and for all plugins.
>
> I am thinking about this design:
> - Plugins don't maintain state in their Python world. They expose
> functions, my application calls them.
> - Everytime I call into them, they are presented with a clean global
> namespace. After the call, the namespace (dict) is thrown away. That
> releases any objects the plugin code has created.
> - So, then I could also actively unload modules they loaded. But I do
> know that this is problematic in particular for modules that use
> native code.
>
> I am interested in both a short-term and a long-term solution.
> Actually, making subinterpreters work better is pretty sexy ...
> because it's hard. :-)

Petr noted that a number of people are working on getting
subinterpreters to a good place.  That includes me. [1][2] :)  We'd
welcome any help!

-eric

[1] https://www.python.org/dev/peps/pep-0554/
[2] https://github.com/ericsnowcurrently/multi-core-python