[Python-Dev] PEP 435: pickling enums created with the functional API

Eli Bendersky eliben at gmail.com
Tue May 7 17:44:46 CEST 2013


On Tue, May 7, 2013 at 8:03 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Tue, May 7, 2013 at 11:34 PM, Eli Bendersky <eliben at gmail.com> wrote:
> > One of the contended issues with PEP 435 on which Guido pronounced was
> the
> > functional API, that allows created enumerations dynamically in a manner
> > similar to namedtuple:
> >
> >   Color = Enum('Color', 'red blue green')
> >
> > The biggest complaint reported against this API is interaction with
> pickle.
> > As promised, I want to discuss here how we're going to address this
> concern.
> >
> > At this point, the pickle docs say that module-top-level classes can be
> > pickled. This obviously works for the normal Enum classes, but is a
> problem
> > with the functional API because the class is created dynamically and has
> no
> > __module__.
> >
> > To solve this, the reference implementation is used the same approach as
> > namedtuple (*). In the metaclass's __new__ (this is an excerpt, the real
> > code has some safeguards):
> >
> >   module_name = sys._getframe(1).f_globals['__name__']
> >   enum_class.__module__ = module_name
> >
> > According to an earlier discussion, this is works on CPython, PyPy and
> > Jython, but not on IronPython. The alternative that works everywhere is
> to
> > define the Enum like this:
> >
> >   Color = Enum('the_module.Color', 'red blue green')
> >
> > The reference implementation supports this as well.
> >
> > Some points for discussion:
> >
> > 1) We can say that using the functional API when pickling can happen is
> not
> > recommended, but maybe a better way would be to just explain the way
> things
> > are and let users decide?
>
> It's probably worth creating a section in the pickle docs and
> explaining the vagaries of naming things and the dependency on knowing
> the module name. The issue comes up with defining classes in __main__
> and when implementing pseudo-modules as well (see PEP 395).
>
>
Any pickle-expert volunteers to do this? I guess we can start by creating a
documentation issue.



>  > 2) namedtuple should also support the fully qualified name syntax. If
> this
> > is agreed upon, I can create an issue.
>
> Yes, I think that part should be done.
>


OK, I'll create an issue.


>
> > 3) Antoine mentioned that work is being done in 3.4 to enable pickling of
> > nested classes (http://www.python.org/dev/peps/pep-3154/). If that gets
> > implemented, I don't see a reason why Enum and namedtuple can't be
> adjusted
> > to find the __qualname__ of the class they're internal to. Am I missing
> > something?
>
> The class based form should still work (assuming only classes are
> involved), the stack inspection will likely fail.
>

I can probably be made to work with a bit more effort than the current
"hack", but I don't see why it wouldn't be doable.



>  > 4) Using _getframe(N) here seems like an overkill to me.
>
> It's not just overkill, it's fragile - it only works if you call the
> constructor directly. If you use a convenience function in a utility
> module, it will try to load your pickles from there rather than
> wherever you bound the name.
>

In theory you can climb the frame stack until the desired place, but this
is specifically what my proposal of adding a function tries to avoid.



>
> > What we really need
> > is just the module in which the current execution currently is (i.e. the
> > metaclass's __new__ in our case). Would it make sense to add a new
> function
> > somewhere in the stdlib of 3.4 (in sys or inspect or ...) that just
> provides
> > the current module name? It seems that all Pythons should be able to
> easily
> > provide it, it's certainly a very small subset of the functionality
> provided
> > by walking the callframe stack. This function can then be used for build
> > fully qualified names for pickling of Enum and namedtuple. Moreover, it
> can
> > be general even more widely - dynamic class building is quite common in
> > Python code, and as Nick mentioned somewhere earlier, the extra power of
> > metaclasses in the recent 3.x's will probably make it even more common.
>
> Yes, I've been thinking along these lines myself, although in a
> slightly more expanded form that also touches on the issues that
> stalled PEP 406 (the import engine API that tries to better
> encapsulate the import state). It may also potentially address some
> issues with initialisation of C extensions (I don't remember the exact
> details off the top of my head, but there's some info we want to get
> from the import machinery to modules initialised from Cython, but the
> loader API and the C module initialisation API both get in the way).
>
> Specifically, what I'm talking about is some kind of implicit context
> similar to the approach the decimal module uses to control operations
> on Decimal instances. In this case, what we're trying to track is the
> "active module", either __main__ (if the code has been triggered
> directly through an operation in that module), or else the module
> currently being imported (if the import machinery has been invoked).
>
> The bare minimum would just be to store the __name__ (using
> sys.modules to get access to the full module if needed) in a way that
> adequately handles nested, circular and threaded imports, but there
> may be a case for tracking a richer ModuleContext object instead.
>
> However, there's also a separate question of whether implicitly
> tracking the active module is really what we want. Do we want that, or
> is what we actually want the ability to define an arbitrary "naming
> context" in order to use functional APIs to construct classes without
> losing the pickle integration of class statements?
>
> What if there was a variant of the class statement that bound the
> result of a function call rather than using the normal syntax:
>
>     class Animal from enum.Enum(members="dog cat bear")
>
> And it was only class statements in that form which manipulated the
> naming context? (you could also use the def keyword rather than class)
>
> Either form would essentially be an ordinary assignment statement,
> *except* that they would manipulate the naming context to record the
> name being bound *and* relevant details of the active module.
>
> Regardless, I think the question is not really well enough defined to
> be a topic for python-dev, even though it came up in a python-dev
> discussion - it's more python-ideas territory.
>

Wait... I agree that having a special syntax for this is a novel idea
that's not well defined and can be discussed on python-ideas. But the
utility function I was mentioning is a pretty simple idea, and it's well
defined. It can be very useful in contexts where code is created
dynamically, by removing the amount of explicit-frame-walking hacks.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130507/b1f481a5/attachment.html>


More information about the Python-Dev mailing list