Correct way to handle independent interpreters when embedding in a single-threaded C++ app

Thu Nov 18 18:35:18 EST 2004

If you are always running the Python scripts within the main thread of
the application, then why are you creating a new thread state and run
the script in that state? Why not just this:

  Py_Initialize();
  PyRun_SimpleString(...);
  Py_Finalize();

(Instead of PyRun_SimpleString, do whatever you want to do there)

Since you are not running any python scripts or calling any python
related stuff from other threads, this is the best approach in my
opinion. This will also ensure that execution of one script wont
effect the execution of another because  you call Py_Finalize after
the script and thus shut down the interpreter.

Mustafa Demirhan

Craig Ringer <craig at postnewspapers.com.au> wrote in message news:<mailman.6537.1100774276.5135.python-list at python.org>...
> Hi folks
> 
> I'm a bit of a newbie here, though I've tried to appropriately research
> this issue before posting. I've found a lot of questions, a few answers
> that don't really answer quite what I'm looking for, but nothing that
> really solves or explains all this. I'll admit to being stumped, hence
> my question here.
> 
> I'm also trying to make this post as clear and detailed as possible.
> Unfortunately, that means it's come out like a book. I hope a few kind
> souls will be game to read it, on the theory that I'm a user who's
> putting in the time to actually provide enough information for once.
> 
> I have a Python interpreter embedded in a C++/Qt application (Scribus -
> http://www.scribus.net). Scribus, while using multi-threading enabled
> libraries, runs in a single 'main' thread. The Python interpreter is
> implemented as a plug-in that's used to run user scripts. Overall it's
> working very well.
> 
> I've run into two problems that are proving very difficult to solve,
> however, and I thought I'd ask here for some words of wisdom. I'm only
> tackling the first one right now. First I'll provide some background on
> how I'm doing things, and what I'm trying to achieve. If anything below
> comes out as a request for Python functionality it's not intended to be
> - it's just a description of what /I'm/ trying to do.
> 
> The Scribus Python plugin is pretty standard - it both embeds the Python
> interpreter and provides an extension module to expose
> application-specific functionality. It is used to permit users to
> execute Python scripts to automate tasks within the application. I also
> hope to make it possible to extend the application using Python, but
> that's not the issue right now. I need to isolate individual script
> executions as much as possible, so that to the greatest extent we can
> manage each script runs in a new interpreter. In other words, I need to
> minimise the chances of scripts treading on each others toes or leaking
> too much with each execution.
> 
> Specifically, as much as possible I need to:
> 
>   - Ensure that memory allocated by Python during a script run is all
>     freed, including any objects created and modules loaded. An
>     exception can be made for C extension modules, so long as they
>     don't leak every time a script is run.
>   - Ensure that no global state (eg loaded modules, globals namespace,
>     etc) persists across script executions.
> 
> I have no need to be able to run Python scripts in parallel with the
> application, nor with each other. If a script goes into an endless loop,
> that's a bug with the script, and not the application's problem. I'd
> like to reduce the chances of scripts conflicting or messing up the app
> state, but don't intend to even try to make it possible to safely run
> untrusted scripts or to completely isolate scripts. If the odd C
> extension module doesn't like it, I can deal with that too.
> 
> Also, some of the extension module functions make Qt gui calls (for
> example, create and display a file chooser dialog) or access internal
> application state in QObject derived classes. According to the Qt
> documentation, this should only be done from the main thread. This is
> another reason why I'm making no attempt to make it possible to run
> normal Python scripts without blocking the application, or run scripts
> in parallel. It also means that all my Python sub-interpreters need to
> share the main (and in fact only) application thread.
> 
> 
> I've hit two issues with this. The first is that executing a script
> crashes the application with SIGABRT if a Python debug build is being
> used. Python crashes the app with the error "Invalid thread state for
> this thread". I'm working with Python 2.3.4 . The crash is triggered by
> a check in pystate.c line 276  - in the PyThreadState_Swap() function:
> 
> The code in question is:
> 
>     /* It should not be possible for more than one thread state
>        to be used for a thread.  Check this the best we can in debug
>        builds.
>     */
> #if defined(Py_DEBUG) && defined(WITH_THREAD)
>     if (new) {
>         PyThreadState *check = PyGILState_GetThisThreadState();
>         if (check && check != new)
>             /* Py_FatalError("Invalid thread state for this thread"); */
>             printf("We would've died here\n");
>     }
> #endif
> 
> A trimmed down and simplified version (eg no error checking, etc) of the
> code I'm using in the plugin that hits this check is:
> 
>     PyThreadState *stateo = PyEval_SaveThread();
>     PyThreadState *state = Py_NewInterpreter();
>     initscribus(Carrier); // init the extension module
>     PySys_SetArgv(1, scriptfilename);
>     PyObject* m = PyImport_AddModule("__main__");
>     PyObject* globals = PyModule_GetDict(m);
>     char* script_string = ... // build script that calls execfile()
>     PyObject* result = PyRun_String(script_string, Py_file_input,\
>         globals, globals);
>     ... // handle possible failure and capture exception
>     Py_EndInterpreter(state);
>     PyEval_RestoreThread(stateo);
> 
> (The full version can be found in
> scribus/plugins/scriptplugin/scriptplugin.cpp line 225-279 of Scribus
> CVS, http://www.scribus.net/)
> 
> The script text isn't really important. It just execfiles()s the user's
> script within a try/catch block to ignore SystemExit and to catch and
> capture any other fatal exceptions.
> 
> The crash occurs at Py_NewInterpreter, when it calls PyThreadState_Swap.
> It's pretty clear _what_ is happening - Python is aborting on a sanity
> check because I'm trying to use multiple thread states in one thread -
> what I'm looking for help with is _why_. When run with a non-debug
> build, scripts run just fine. It also runs fine when I use a debug build
> of Python without thread support (as is obvious from the code snippet
> above). I'm sure there are cases where things can / do go wrong, but for
> general use it appears to be just peachy.
> 
> So ... my question is, what are the issues behind this check? Does it
> indicate that there will be a problem with this condition in all cases?
> My understanding is that it's to do with the way Python doesn't use the
> full capabilities of platform threading libraries, and has some shared
> globals that could cause issues. Correct? If so, is there a way around
> this?
> 
> All I'm looking to do is to create a clean sub-interpreter state, run a
> script in it (in the main thread, with nothing else running) then
> dispose of the interpreter at script exit. It's desirable to keep the
> main interpreter usable as well, but there will never be more than one
> sub-interpreter, and there will never be Python code running in the main
> and sub interpreters at the same time. Does the existence of this check
> mean that what I'm trying to do is incorrect or unsafe? If not, might it
> be possible to provide apps with a way to disable this check (think an
> "I know what I'm doing" flag)? Is there another, saner way to do what I
> want?
> 
> This post describes a similar issue to mine, though their goals are
> different, and I don't think the solution will work for me:
> http://groups.google.com.au/groups?hl=en&lr=&client=firefox-a&selm=brlicq%243dh%241%40arachne.labyrinth.net.au&rnum=7
> 
> This message describes the issue I'm seeing:
> http://groups.google.com.au/groups?q=Py_NewInterpreter&hl=en&lr=&group=comp.lang.python.*&client=firefox-a&selm=knbstvgsn3o3qmmtu975g8eb94rhpmae2o%404ax.com&rnum=5
> 
> Another related message:
> http://groups.google.com.au/groups?q=Py_NewInterpreter&hl=en&lr=&group=comp.lang.python.*&client=firefox-a&selm=mailman.3563.1095687831.5135.python-list%40python.org&rnum=1
> 
> Someone says it's just broken:
> http://groups.google.com.au/groups?q=Py_NewInterpreter&start=10&hl=en&lr=&group=comp.lang.python.*&client=firefox-a&selm=m34r8r5jwr.fsf%40mira.informatik.hu-berlin.de&rnum=17
> 
> 
> I've tried one other approach that doesn't involve
> Py_NewInterpreter/Py_EndInterpreter, but didn't have much success. What
> I tried to do was run each script with a new global dict, so that they
> at least had separate global namespaces (though they'd still be able to
> influence the next script's interpreter state / module state). If I
> recall correctly I ended up with code like this:
> 
>     execfile(filename, {'__builtins__'=__builtins__,
>                         '__name__':'__main__',
>                         '__file__':filename})
> 
> being called from PyRun_String.
> 
> This appeared to work fine, but turned out to leak memory like a sieve.
> Objects in the script's global namespace weren't being disposed of when
> the script terminated. Consequently, if I had a script with one line:
> 
>     x = x = ' '*200000000
> 
> then each time I ran the script the app would gobble a large chunk more
> memory and not release it. If I wrote a script that very carefully
> deleted everything it put in the top-level namespace before it exited,
> such as all variables, imports, classes, and functions, I still leaked a
> little memory and a few references, but nothing much. Unfortunately,
> doing that is also rather painful at best and seems _really_ clumsy.
> 
> It looked to me after some testing with a debug build like the global
> dictionaries that were being created for each execfile() call were not
> being disposed of after the call terminated, even though no code I was
> aware of continued to hold references to them. Circular references? Do I
> have to manually invoke the cyclic reference cleanup code in Python when
> embedding?
> 
> I'm sorry for the lack of detail provided in the discussion of this
> approach. It was a while ago. If folks here think it's viable I can go
> back and get some more hard data.
> 
> With the 'new globals dict' approach, it was also possible for people to
> mangle modules and for the next script to see the changes. If there's a
> way to re-init modules between runs (at least the built-in ones like
> sys, __builtins__, etc, plus the app's extension module and any modules
> written in Python), that'd be fantastic.
> 
> 
> If there's some way to do achieve what I want to do - get scripts to
> execute in private or mostly-private environments in the main thread of
> an application - I'd be overjoyed to hear it. I'm very sorry for the
> mammoth message, and hope I've made some sense and provided enough
> information without boring you all to tears. It's clear that there's
> been quite a bit of interest in this topic from my digging through the
> list archives, but I just wasn't able to find a clear, definitive
> answer.
> 
> Phew. To anybody who got this far, thankyou very much for your time and
> patience.