Multiple interpreters retaining huge amounts of memory

"Martin v. Löwis" martin at v.loewis.de
Sun Feb 3 15:13:02 EST 2008


>> - objects can easily get shared across interpreters, and often are.
>>    This is particularly true for static variables that extensions keep,
>>    and for static type objects.
> 
> Yep, but basically a problem with how people write C extension
> modules. Ie., they don't write them with the fact that multiple
> interpreters can be used in mind.

I still consider it a bug in Python, and the multiple-interpreter
feature, not so much in the extension modules. Of course, they
may have bugs on top of that, but in general, they have no way
of cleaning up when an interpreter shuts down (until PEP 3121
gets implemented).

> Some details about this in section 'Multiple Python Sub Interpreters'
> of:
> 
>   http://code.google.com/p/modwsgi/wiki/ApplicationIssues

A common concern is that people think that the multiple-interpreters
feature is a security mechanism, i.e. works as a sandbox. Maybe that's
more a communication problem than an actual problem with the feature,
however, it can't be emphasized enough that the feature is *not*
a security mechanism: it is possible to get at all objects even of
"other" interpreters.

> You might also read section 'Application Environment Variables' of
> that document. This talks about the problem of leakage of environment
> variables between sub interpreters. There probably isn't much that one
> can do about it as one needs to push changes to os.environ into C
> environment variables so various system library calls will get them,
> but still quite annoying that the variables set in one interpreter
> then show up in interpreters created after that point. It means that
> environment variable separation for changes made unique to a sub
> interpreter is impossible.

That's not really true. You can't use os.environ for that, yes. However,
you can pass explicit environment dictionaries to, say, os.execve. If
some library relies on os.environ, you could hack around this aspect
and do

    os.environ = dict(os.environ)

Then you can customize it. Of course, changes to this dictionary now
won't be reflected into the C library's environ, so you'll have to
use execve now (but you should do so anyway in a multi-threaded
application with changing environments).

> There is another problem with deleting interpreters and then creating
> new ones. This is where a C extension module doesn't declare reference
> counts to static Python objects it creates.

Right - that's a clear bug in the module, though. If the Python 
documentation is not sufficiently clear about the requirement that
_every_ assignment to a PyObject* needs to be accompanied with a
Py_INCREF, feel free to contribute patches to make that more clear.

> I don't know whether it is a fundamental problem with the tool or how
> people use it, but Pyrex generated code seems to also do this.

I've never used Pyrex myself, but I would be surprised if it really
had such a severe refcounting error.

>> - the mechanism of PEP 311 doesn't work for multiple interpreters.
> 
> Yep, and since SWIG defaults to using it, it means that SWIG generated
> code can't be used in anything but the main interpreter. Subversion
> bindings seem to possibly have a lot of issues related to this as
> well.

Please understand that, when this PEP was written, this issue was 
explicitly discussed, and developers explicitly agreed "the multi-
interpreters feature is broken, anyway, so don't let that issue
stop us from providing PEP 311".

> First is that one can't use different versions of a C extension module
> in different sub interpreters. This is because the first one loaded
> effectively gets priority. 

That's not supposed to happen, AFAICT. The interpreter keeps track of
loaded extensions by file name, so if the different version lives in
a different file, that should work fine.

Are you using sys.setdlopenflags by any chance? Setting the flags
to RTLD_GLOBAL could have that effect; you'ld get the init function
of the first module always. By default, Python uses RTLD_LOCAL,
so it should be able to keep the different versions apart (on
Unix with libdl; on Windows, symbol resolution is per-DLL anyway).

Kind regards,
Martin



More information about the Python-list mailing list