Embedding multiple interpreters

Fri Dec 6 08:56:01 EST 2013

Hi Chris,

On 06/12/13 22:27, Chris Angelico wrote:
 > On Fri, Dec 6, 2013 at 8:35 PM, Garthy
 > <garthy_nhtyp at entropicsoftware.com>  wrote:
 >> I think the ideal is completely sandboxed, but it's something that I
 >> understand I may need to make compromises on. The bare minimum would be
 >> protection against inadvertent interaction. Better yet would be a 
setup that
 >> made such interaction annoyingly difficult, and the ideal would be 
where it
 >> was impossible to interfere.
 >
 > In Python, "impossible to interfere" is a pipe dream. There's no way
 > to stop Python from fiddling around with the file system, and if
 > ctypes is available, with memory in the running program. The only way
 > to engineer that kind of protection is to prevent _the whole process_
 > from doing those things (using OS features, not Python features),
 > hence the need to split the code out into another process (which might
 > be chrooted, might be running as a user with no privileges, etc).

Absolutely- it would be an impractical ideal. If it was my highest and 
only priority, CPython might not be the best place to start. But there 
are plenty of other factors that make Python very desirable to use 
regardless. :) Re file and ctype-style functionality, that is something 
I'm going to have to find a way to limit somewhat. But first things 
first: I need to see what I can accomplish re initial embedding with a 
reasonable amount of work.

 > A setup that makes such interaction "annoyingly difficult" is possible
 > as long as your users don't think Ruby. For instance:
 >
 > # script1.py
 > import sys
 > sys.stdout = open("logfile", "w")
 > while True: print("Blah blah")
 >
 > # script2.py
 > import sys
 > sys.stdout = open("otherlogfile", "w")
 > while True: print("Bleh bleh")
 >
 >
 > These two scripts won't play nicely together, because each has
 > modified global state in a different module. So you'd have to set that
 > as a rule. (For this specific example, you probably want to capture
 > stdout/stderr to some sort of global log file anyway, and/or use the
 > logging module, but it makes a simple example.)

Thanks for the example. Hopefully I can minimise the cases where this 
would potentially be a problem. Modifying the basic environment and the 
source is something I can do readily if needed.

Re stdout/stderr, on that subject I actually wrote a replacement log 
catcher for embedded Python a few years back. I can't remember how on 
earth I did it now, but I've still got the code that did it somewhere.

 > Most Python scripts
 > aren't going to do this sort of thing, or if they do, will do very
 > little of it. Monkey-patching other people's code is a VERY rare thing
 > in Python.

That's good to hear. :)

 >> The closest analogy for understanding would be browser plugins: 
Scripts from
 >> multiple authors who for the most part aren't looking to create 
deliberate
 >> incompatibilities or interference between plugins. The isolation is 
basic,
 >> and some effort is made to make sure that one plugin can't cripple 
another
 >> trivially, but the protection is not exhaustive.
 >
 > Browser plugins probably need a lot more protection - maybe it's not
 > exhaustive, but any time someone finds a way for one plugin to affect
 > another, the plugin / browser authors are going to treat it as a bug.
 > If I understand you, though, this is more akin to having two forms on
 > one page and having JS validation code for each. It's trivially easy
 > for one to check the other's form objects, but quite simple to avoid
 > too, so for the sake of encapsulation you simply stay safe.

There have been cases where browser plugins have played funny games to 
mess with the behaviour of other plugins (eg. one plugin removing 
entries from the configuration of another). It's certainly not ideal, 
but it comes from the environment being not entirely locked down, and 
one plugin author being inclined enough to make destructive changes that 
impact another. I think the right effort/reward ratio will mean I end up 
in a similar place.

I know it's not the best analogy, but it was one that readily came to 
mind. :)

 >> With the single interpreter and multiple thread approach suggested, 
do you
 >> know if this will work with threads created externally to Python, 
ie. if I
 >> can create a thread in my application as normal, and then call something
 >> like PyGILState_Ensure() to make sure that Python has the internals 
it needs
 >> to work with it, and then use the GIL (or similar) to ensure that 
accesses
 >> to it remain thread-safe?
 >
 > Now that's something I can't help with. The only time I embedded
 > Python seriously was a one-Python-per-process system (arbitrary number
 > of processes fork()ed from one master, but each process had exactly
 > one Python environment and exactly one database connection, etc), and
 > I ended up being unable to make it secure, so I had to switch to
 > embedding ECMAScript (V8, specifically, as it happens... I'm morbidly
 > curious what my boss plans to do, now that he's fired me; he hinted at
 > rewriting the C++ engine in PHP, and I'd love to be a fly on the wall
 > as he tries to test a PHP extension for V8 and figure out whether or
 > not he can trust arbitrary third-party compiled code). But there'll be
 > someone on this list who's done threads and embedded Python.

Thanks in any case. I'm guessing someone with the right inclination and 
experience might see the question and jump in with their thoughts.

Many thanks for your continued thoughts by the way. :)

Cheers,
Garth

PS. As a dev with a heavy C++ background, I also wonder at the type of 
C++ engine that could be improved with a PHP rewrite. ;)