Multiple interpreters retaining huge amounts of memory

Graham Dumpleton Graham.Dumpleton at gmail.com
Sun Feb 3 17:24:20 EST 2008


On Feb 4, 7:13 am, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > You might also read section 'Application Environment Variables' of
> > that document. This talks about the problem of leakage of environment
> > variables between sub interpreters. There probably isn't much that one
> > can do about it as one needs to push changes to os.environ into C
> > environment variables so various system library calls will get them,
> > but still quite annoying that the variables set in one interpreter
> > then show up in interpreters created after that point. It means that
> > environment variable separation for changes made unique to a sub
> > interpreter is impossible.
>
> That's not really true. You can't use os.environ for that, yes.

Which bit isn't really true? When you do:

  os.environ['XYZ'] = 'ABC'

this results in a corresponding call to:

  putenv('XYZ=ABC')

as well as setting value in os.environ dictionary.

  >>> os.environ.__class__
  <class os._Environ at 0x57510>

        class _Environ(UserDict.IterableUserDict):
            def __setitem__(self, key, item):
                putenv(key, item)
                self.data[key] = item

Because os.environ is set from the current copy of C environ at time
the sub interpreter is created, then a sub interpreter created at a
later point will have XYZ show up in os.environ of that sub
interpreter.

> However,
> you can pass explicit environment dictionaries to, say, os.execve. If
> some library relies on os.environ, you could hack around this aspect
> and do
>
>     os.environ = dict(os.environ)
>
> Then you can customize it. Of course, changes to this dictionary now
> won't be reflected into the C library's environ, so you'll have to
> use execve now (but you should do so anyway in a multi-threaded
> application with changing environments).

As a platform provider and not the person writing the application I
can't really do it that way and effectively force people to change
there code to make it work. It also isn't just exec that is the issue,
as there are other system calls which can rely on the environment
variables.

The only half reasonable solution I have ever been able to dream up is
that just prior to first initialising Python that a snapshot of C
environment is taken and as sub interpreters are created os.environ is
replaced with a new instance of the _Environ wrapper which uses the
initial snapshot rather than what the environment is at the time. At
least then each sub interpreter gets a clean copy of what existed when
the process first started.

Even this isn't really a solution though as changes to os.environ by
sub interpreters still end up getting reflected in C environment and
so the C environment becomes an accumulation of settings from
different code sets with a potential for conflict at some point.

Luckily this issue hasn't presented itself as big enough of a problem
at this point to really be concerned.

> > First is that one can't use different versions of a C extension module
> > in different sub interpreters. This is because the first one loaded
> > effectively gets priority.
>
> That's not supposed to happen, AFAICT. The interpreter keeps track of
> loaded extensions by file name, so if the different version lives in
> a different file, that should work fine.
>
> Are you using sys.setdlopenflags by any chance? Setting the flags
> to RTLD_GLOBAL could have that effect; you'ld get the init function
> of the first module always. By default, Python uses RTLD_LOCAL,
> so it should be able to keep the different versions apart (on
> Unix with libdl; on Windows, symbol resolution is per-DLL anyway).

That may be true, but I have seen enough people raise strange problems
that I at least counsel people not to rely on being able to import
different versions in different sub interpreters.

The problems may well just fall into the other categories we have been
discussing. Within Apache at least, another source of problems which
can arise is that Apache, or other Apache modules (eg. PHP), can
directly link to shared libraries where they are then loaded at global
context. Even if a Python module tries to isolate itself, one can
still end up with conflicts between the version of a shared library
that the module may want to use and what something else has already
loaded. The loader scope doesn't always protect against this.

It is also always hard when you aren't yourself having the problem and
you are relying on others to try and debug their problem for you. More
often than not the amount of information they provide isn't that good
and even when you ask them to try specific things for you to test out
ideas, they don't. So often one can never uncover the true problem,
and it has thus become simpler to limit the source of potential
problems and just tell them to avoid doing it. :-)

Graham



More information about the Python-list mailing list