Embedding multiple interpreters

Fri Dec 6 06:57:12 EST 2013

On Fri, Dec 6, 2013 at 8:35 PM, Garthy
<garthy_nhtyp at entropicsoftware.com> wrote:
> I think the ideal is completely sandboxed, but it's something that I
> understand I may need to make compromises on. The bare minimum would be
> protection against inadvertent interaction. Better yet would be a setup that
> made such interaction annoyingly difficult, and the ideal would be where it
> was impossible to interfere.

In Python, "impossible to interfere" is a pipe dream. There's no way
to stop Python from fiddling around with the file system, and if
ctypes is available, with memory in the running program. The only way
to engineer that kind of protection is to prevent _the whole process_
from doing those things (using OS features, not Python features),
hence the need to split the code out into another process (which might
be chrooted, might be running as a user with no privileges, etc).

A setup that makes such interaction "annoyingly difficult" is possible
as long as your users don't think Ruby. For instance:

# script1.py
import sys
sys.stdout = open("logfile", "w")
while True: print("Blah blah")

# script2.py
import sys
sys.stdout = open("otherlogfile", "w")
while True: print("Bleh bleh")

These two scripts won't play nicely together, because each has
modified global state in a different module. So you'd have to set that
as a rule. (For this specific example, you probably want to capture
stdout/stderr to some sort of global log file anyway, and/or use the
logging module, but it makes a simple example.) Most Python scripts
aren't going to do this sort of thing, or if they do, will do very
little of it. Monkey-patching other people's code is a VERY rare thing
in Python.

> The closest analogy for understanding would be browser plugins: Scripts from
> multiple authors who for the most part aren't looking to create deliberate
> incompatibilities or interference between plugins. The isolation is basic,
> and some effort is made to make sure that one plugin can't cripple another
> trivially, but the protection is not exhaustive.

Browser plugins probably need a lot more protection - maybe it's not
exhaustive, but any time someone finds a way for one plugin to affect
another, the plugin / browser authors are going to treat it as a bug.
If I understand you, though, this is more akin to having two forms on
one page and having JS validation code for each. It's trivially easy
for one to check the other's form objects, but quite simple to avoid
too, so for the sake of encapsulation you simply stay safe.

> With the single interpreter and multiple thread approach suggested, do you
> know if this will work with threads created externally to Python, ie. if I
> can create a thread in my application as normal, and then call something
> like PyGILState_Ensure() to make sure that Python has the internals it needs
> to work with it, and then use the GIL (or similar) to ensure that accesses
> to it remain thread-safe?

Now that's something I can't help with. The only time I embedded
Python seriously was a one-Python-per-process system (arbitrary number
of processes fork()ed from one master, but each process had exactly
one Python environment and exactly one database connection, etc), and
I ended up being unable to make it secure, so I had to switch to
embedding ECMAScript (V8, specifically, as it happens... I'm morbidly
curious what my boss plans to do, now that he's fired me; he hinted at
rewriting the C++ engine in PHP, and I'd love to be a fly on the wall
as he tries to test a PHP extension for V8 and figure out whether or
not he can trust arbitrary third-party compiled code). But there'll be
someone on this list who's done threads and embedded Python.

ChrisA