[Python-Dev] PEP 435: pickling enums created with the functional API

Nick Coghlan ncoghlan at gmail.com
Tue May 7 17:03:38 CEST 2013


On Tue, May 7, 2013 at 11:34 PM, Eli Bendersky <eliben at gmail.com> wrote:
> One of the contended issues with PEP 435 on which Guido pronounced was the
> functional API, that allows created enumerations dynamically in a manner
> similar to namedtuple:
>
>   Color = Enum('Color', 'red blue green')
>
> The biggest complaint reported against this API is interaction with pickle.
> As promised, I want to discuss here how we're going to address this concern.
>
> At this point, the pickle docs say that module-top-level classes can be
> pickled. This obviously works for the normal Enum classes, but is a problem
> with the functional API because the class is created dynamically and has no
> __module__.
>
> To solve this, the reference implementation is used the same approach as
> namedtuple (*). In the metaclass's __new__ (this is an excerpt, the real
> code has some safeguards):
>
>   module_name = sys._getframe(1).f_globals['__name__']
>   enum_class.__module__ = module_name
>
> According to an earlier discussion, this is works on CPython, PyPy and
> Jython, but not on IronPython. The alternative that works everywhere is to
> define the Enum like this:
>
>   Color = Enum('the_module.Color', 'red blue green')
>
> The reference implementation supports this as well.
>
> Some points for discussion:
>
> 1) We can say that using the functional API when pickling can happen is not
> recommended, but maybe a better way would be to just explain the way things
> are and let users decide?

It's probably worth creating a section in the pickle docs and
explaining the vagaries of naming things and the dependency on knowing
the module name. The issue comes up with defining classes in __main__
and when implementing pseudo-modules as well (see PEP 395).

> 2) namedtuple should also support the fully qualified name syntax. If this
> is agreed upon, I can create an issue.

Yes, I think that part should be done.

> 3) Antoine mentioned that work is being done in 3.4 to enable pickling of
> nested classes (http://www.python.org/dev/peps/pep-3154/). If that gets
> implemented, I don't see a reason why Enum and namedtuple can't be adjusted
> to find the __qualname__ of the class they're internal to. Am I missing
> something?

The class based form should still work (assuming only classes are
involved), the stack inspection will likely fail.

> 4) Using _getframe(N) here seems like an overkill to me.

It's not just overkill, it's fragile - it only works if you call the
constructor directly. If you use a convenience function in a utility
module, it will try to load your pickles from there rather than
wherever you bound the name.

> What we really need
> is just the module in which the current execution currently is (i.e. the
> metaclass's __new__ in our case). Would it make sense to add a new function
> somewhere in the stdlib of 3.4 (in sys or inspect or ...) that just provides
> the current module name? It seems that all Pythons should be able to easily
> provide it, it's certainly a very small subset of the functionality provided
> by walking the callframe stack. This function can then be used for build
> fully qualified names for pickling of Enum and namedtuple. Moreover, it can
> be general even more widely - dynamic class building is quite common in
> Python code, and as Nick mentioned somewhere earlier, the extra power of
> metaclasses in the recent 3.x's will probably make it even more common.

Yes, I've been thinking along these lines myself, although in a
slightly more expanded form that also touches on the issues that
stalled PEP 406 (the import engine API that tries to better
encapsulate the import state). It may also potentially address some
issues with initialisation of C extensions (I don't remember the exact
details off the top of my head, but there's some info we want to get
from the import machinery to modules initialised from Cython, but the
loader API and the C module initialisation API both get in the way).

Specifically, what I'm talking about is some kind of implicit context
similar to the approach the decimal module uses to control operations
on Decimal instances. In this case, what we're trying to track is the
"active module", either __main__ (if the code has been triggered
directly through an operation in that module), or else the module
currently being imported (if the import machinery has been invoked).

The bare minimum would just be to store the __name__ (using
sys.modules to get access to the full module if needed) in a way that
adequately handles nested, circular and threaded imports, but there
may be a case for tracking a richer ModuleContext object instead.

However, there's also a separate question of whether implicitly
tracking the active module is really what we want. Do we want that, or
is what we actually want the ability to define an arbitrary "naming
context" in order to use functional APIs to construct classes without
losing the pickle integration of class statements?

What if there was a variant of the class statement that bound the
result of a function call rather than using the normal syntax:

    class Animal from enum.Enum(members="dog cat bear")

And it was only class statements in that form which manipulated the
naming context? (you could also use the def keyword rather than class)

Either form would essentially be an ordinary assignment statement,
*except* that they would manipulate the naming context to record the
name being bound *and* relevant details of the active module.

Regardless, I think the question is not really well enough defined to
be a topic for python-dev, even though it came up in a python-dev
discussion - it's more python-ideas territory.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list