[Python-Dev] New _Py_InitializeFromConfig() function (PEP 432)

Fri Aug 3 14:23:41 EDT 2018

Before I dive in, I'll say that I'd really like to hear Nick's opinion
on all this. :)

On Thu, Aug 2, 2018 at 9:59 AM Victor Stinner <vstinner at redhat.com> wrote:
> 2018-08-02 1:18 GMT+02:00 Eric Snow <ericsnowcurrently at gmail.com>:
> > The "core" config is basically the config for the runtime.  In fact,
> > PEP 432 renamed "core" to "runtime".  Please keep the firm distinction
> > between the runtime and the (main) interpreter.
>
> There is already something called _PyRuntime but it's shared between
> all interpreters.

_PyRuntime is a static global of type PyRuntimeState.  It is where I
consolidated (nearly) all the global runtime state last September.

> _PyCoreConfig is already *per* interpreter.

This was done as part of the PEP 432 implementation, which I landed
during PyCon 2017.  If PyRuntimeState had existed already I'm sure it
would be there instead.

> Would you mind to elaborate what you mean by the "main interpreter"? I
> don't see anything obvious in the current code about what is a "main
> interpreter". Technically, I don't see anything like that.

The main interpreter is the first one created (during runtime
initialization).  It is special for a variety of reasons.  Here are
the ones I could think of:

1. the "main" thread will always belong to the main interpreter since
it is the first PyThreadState created
2. runtime initialization uses the main interpreter exclusively
3. the first phase of runtime initialization (pre-initialization) ends
with the main interpreter being *partially* configured
4. during the second phase (initializing), the partially-configured
main interpreter facilitates the use of most of the C-API and may be
used by embedders
  * this is the only time that an interpreter may be used in this way,
and it only happens with the main interpreter
5. runtime finalization takes place using the main interpreter
6. the main interpreter is the last one destroyed during finalization
7. the REPL runs only in the main interpreter
8. the Python CLI is run in the main interpreter (i.e. in its __main__ module)
9. the main interpreter cannot be destroyed (except during finalization)
10. in Python code the main interpreter will always exist
11. it is the parent of all subinterpreters created in Python code (via PEP 554)
12. signals are handled only in the main interpreter
13. all single-threaded Python code is run in the main interpreter

Note that there isn't anything special to the interpreter itself, but
rather in where and how it's used.  However, that matters and the
runtime needs to treat it specially.

I expect all this isn't well-documented because it is relevant to very
few people.

> I'm still not convinced that we need _PyMainInterpreterConfig:

Let's step back a moment and consider the course of events:

1. PEP 432 was created nearly 6 years ago to address the tangle that
runtime initialization had become, with the intent of helping both the
CPython maintainers and embedders
2. Nick did some re-organization around then (e.g. factoring out
pylifecycle.c) to facilitate an implementation of the PEP
3. Nick implemented PEP 432, with a plan to merge it as a *private*
API regardless of whether or not the PEP was accepted (with general
consensus that doing so was a good idea)
  * see https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstrap
  * landing the private API would allow us to iron out the details of the PEP
  * work happened in spurts in 2013, 2015, and 2016; I kept poking
Nick because the implementation was a big blocker for my
multi-core/subinterpreters project
4. leading up to (and at) PyCon 2017, I forked Nick's branch, moved it
to github, rebased it onto master, got it working again, created a PR,
and finally landed it
5. since then the implementation has changed a bunch (due to Victor's
much appreciate efforts) and has diverged from the PEP
  * notably it's unclear that code (especially pymain) strictly
conforms to the phases in the PEP

At this point the PEP is out of date.  There have been several mailing
list threads (all python-dev, IIRC) and some BPO issues where Victor
solicited clarification or expressed a desire to change things and
Nick gave feedback.  None of that made it into the PEP. :(
Consequently the PEP is inconsistent with the actual target.
Furthermore, as was intended, we've learned of a few ways that the PEP
could be improved.

We *really* need to get the PEP updated so we can be sure everyone has
all the info.

Regarding the justification for the "main interpreter" config, the
implementation has diverged from the original intent of the PEP:

* the core/runtime config was meant to hold the minimal data needed to
bootstrap/initialize the basic (limited) functionality of the C-API,
including a restricted main interpreter
  + the struct members were strictly C plain-old-types since using
PyObject would require the runtime to already be (partially)
initialized
  + in the last year a lot of data has been added to this config; I
don't know how much is strictly necessary to bootstrap the runtime
(end of phase 1) and how much could be dealt with in phase 2
* the "main interpreter" config was meant to hold all the config
needed to finish initializing the runtime (end of phase 2)
  + the struct members were mostly PyObject* (possible since most
builtin types are available at this point)
  + the PEP proposes a bunch more fields than the implementation has;
we planned on adding them a few at a time

> _PyCoreConfig contains the same information. Is it really worth it to
> duplicate all _PyCoreConfig (more than 36 fields) in
> _PyMainInterpreterConfig? _PyMainInterpreterConfig adds a third copy
> of many paramters: another opportunity to introduce an inconsistency.

TBH, the PEP *should* have a clear answer for your question here,
Victor.  It has some explanation, but clearly it is incomplete (hence
this continuing email thread).

The duplication is partly a consequence of what has happened in the
last year: a bunch of fields were added to the core config that were
not in the PEP.  However, note the key differences between the two
structs:

* core/runtime config
  + minimal
  + simple C fields
  + meant for embedders/pymain to bootstrap a limited runtime
  + not really meant to be used after calling Py_InitializeRuntime
(AKA Py_InitializeCore)
* main interpreter config
  + includes everything needed to finish full runtime initialization
  + has PyObject* fields
  + meant for embedders/pymain to finish initializing the runtime
  + not really meant to be used after calling
Py_ConfigureMainInterpreter (except when initializing a subintepreter)

Originally there wasn't much overlap.  Furthermore, both of them are
kept around so that, via the C-API (or directly in the CPython impl.),
we could expose what data was used to initialize the runtime.  This
fills much the same role as the existing global Py_* variables.

The duplication is due to there being C and PyObject versions.  It is
for the sake of embedders (and a little bit of sanity).  The big
reason why it shouldn't be a problem is because
PyMainInterpreterConfig is generated directly from PyRuntimeConfig
(AKA PyCoreConfig) and only *after* we've used the runtime config to
bootstrap the limited runtime (after which it shouldn't be modified
ever).  So there's no risk of inconsistency, right?

Perhaps it would make sense to only keep a const copy of both, to
avoid modification?

> Right now, an interpreter contains both: core and main
> configurations...

As noted above, the core/runtime config should probably be on
PyRuntimeState instead.

Regarding the "main" config, PyMainInterpreterConfig probably makes
more sense as one of the following:

1. on PyRuntimeState, like the core/runtime config (since it's a one-off)
2. on PyInterpreterState, like now, but set to NULL on all but the
main interpreter (which would allow us to distinguish the main
interpreter from the rest)

Both would require PyInterpreterConfig from PEP 432, but expanded to
cover all config that might be unique to an interpreter.

Also, conceptually there's a different between
the-config-used-to-finish-runtime-init and the
config-used-to-initialize-an-interpreter (including the main
interpreter).  In fact, PEP 432 does include a PyInterpreterConfig.
However, in the current implementation, PyMainInterpreterConfig fills
that role exclusively, which is confusing since we use the "main
interpreter" config to initialize all interpreters (not just the main
one).

So here's what might make sense to do:

1. rename "core" to "runtime" (to reduce confusion)
2. move PyInterpreterState.runtime_config to PyRuntimeState.config
  + prevent modification after Py_InitializeRuntime() is called (e.g.
keep a const copy)?
3. move PyInterpreterState.config to PyRuntimeConfig.main_config
  + prevent modification after Py_ConfigureMainInterpreter() is called
(e.g. keep a const copy)?
  + keep the PyMainInterpreterConfig and Py_ConfigureMainInterpreter names
4. add PyInterpreterConfig with only the parts of
PyMainInterpreterConfig needed to initialize any interpreter
  + add Py_NewInterpreterEx(PyInterpreterConfig) to allow explicitly
passing a config?
5. add PyInterpreterState.config (type PyInterpreterConfig) to record
the config used to initialize that interpreter
  + prevent modification after the interpreter is initialized (e.g.
keep a const copy)?

> I propose to *remove* _PyMainInterpreterConfig and rename
> _PyCoreConfig as _PyInterpreterConfig. I would also propose to merge
> again Py_Initialize() to have a single step instead of the current
> core step + main step: 2 steps.

So you are not in favor of PEP 432 then. :)

-eric