[Python-ideas] Updated PEP 432: Simplifying the CPython update sequence

Sat Jan 5 22:42:20 CET 2013

Hi Nick,

PEP 432 is looking very nice.  It'll be fun to watch the implementation come
together. :)

Some comments...

The start up sequences:

> * Pre-Initialization - no interpreter available
> * Initialization - interpreter partially available

What about "Initializing"?

> * Initialized - full interpreter available, __main__ related metadata
>   incomplete
> * Main Execution - optional state, __main__ related metadata populated,
>   bytecode executing in the __main__ module namespace 

What is "optional" about this state?  Maybe it should be called "Operational"?

> ... separate system Python (spython) executable ...

I love the idea, but I'm not crazy about the name.  What about
`python-minimal` (yes, it's deliberately longer.  Symlinks ftw. :)

> <TBD: Did I miss anything?>

What about sys.implementation?

> as it failed to be updated for the virtual environment support added in
> Python 3.3 (detailed in PEP 420).

venv is defined in PEP 405 (there are two cases of mis-referencing).

Note that there may be other important build time settings on some platforms.
An example is Debian/Ubuntu, where we define the multiarch triplet in the
configure script, and pass that through Makefile(.pre.in) to sysmodule.c for
exposure as sys.implementation._multiarch.

> For a command executed with -c, it will be the string "-c"
> For explicitly requested input from stdin, it will be the string "-"

Wow, I couldn't believe it but it's true!  That seems crazy useless. :)

> Embedding applications must call Py_SetArgv themselves. The CPython logic
> for doing so is part of Py_Main() and is not exposed separately. However,
> the runpy module does provide roughly equivalent logic in runpy.run_module
> and runpy.run_path.

As I've mentioned before on the python-porting mailing list, this is actually
more difficult than it seems because main() takes char*s but Py_SetArgv() and
Py_SetProgramName() takes wchar_t*s.

Maybe Python's own conversion could be refactored to make this easier either
as part of this PEP or after the PEP is implemented.

> int Py_ReadConfiguration(PyConfig *config);

> The config argument should be a pointer to a Python dictionary. For any
> supported configuration setting already in the dictionary, CPython will
> sanity check the supplied value, but otherwise accept it as correct.

So why not define this to take a PyObject* or a PyDictObject* ?

(also: the Py_Config struct members need the correct concrete type pointers,
e.g. PyDictObject*)

> Alternatively, settings may be overridden after the Py_ReadConfiguration
> call (this can be useful if an embedding application wants to adjust a
> setting rather than replace it completely, such as removing sys.path[0]).

How will setting something after Py_ReadConfiguration() is called change a
value such as sys.path?  Or is this the reason why you pass a Py_Config to
Py_EndInitialization()?

(also, see the type typo <wink> in the definition of Py_EndInitialization())

Also, I suggest taking the opportunity to change the sense of flags such as
no_site and dont_write_bytecode.  I find it much more difficult to reason that
"dont_write_bytecode = 0" means *do* write bytecode, rather than
"write_bytecode = 1".  I.e. positives are better than double-negatives.

> sys.argv[0] may not yet have its final value
> it will be -m when executing a module or package with CPython

Gosh, wouldn't it be nice if this could have a more useful value?

> Initial thought is that hiding the various options behind a single API would
> make that API too complicated, so 3 separate APIs is more likely:

+1

> The interpreter state will be updated to include details of the
> configuration settings supplied during initialization by extending the
> interpreter state object with an embedded copy of the Py_CoreConfig and
> Py_Config structs.

Couldn't it just have a dict with all the values from both structs collapsed
into it?

> For debugging purposes, the configuration settings will be exposed as a
> sys._configuration simple namespace

I suggest un-underscoring the name and making it public.  It might be useful
for other than debugging purposes.

> Is Py_IsRunningMain() worth keeping?

Perhaps.  Does it provide any additional information above Py_IsInitialized()?

> Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via
> the sys module?

I can't think of a use case.

> Is the Py_Config struct too unwieldy to be practical? Would a Python
> dictionary be a better choice?

Although I see why you've spec'd it this way, I don't like having *two* config
structures (Py_CoreConfig and Py_Config).  Having a dictionary for the latter
would probably be fine, and in fact you could copy the Py_Config values into
it (when possible during the init sequence) and expose it in the sys module.

> Would it be better to manage the flag variables in Py_Config as Python
> integers so the struct can be initialized with a simple memset(&config, 0,
> sizeof(*config))?

Would we even notice the optimization?

> A System Python Executable

This should probably at least mention Christian's idea of the -I flag (which I
think hasn't been PEP'd yet).  We can bikeshed about the name of the
executable later. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130105/32736f7b/attachment.pgp>