From ncoghlan at gmail.com Sun Sep 1 03:53:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Sep 2013 11:53:35 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 31 Aug 2013 01:31, "Brett Cannon" wrote: > > > > > On Fri, Aug 30, 2013 at 11:12 AM, Nick Coghlan wrote: >> >> On 31 August 2013 00:57, Brett Cannon wrote: >> >> So perhaps a better name might be "prepare_module" (by analogy to PEP >> >> 3115), and have it accept a "reloading" parameter, which is an >> >> existing module to be reused. >> > >> > >> > Is this to replace create_module() or exec_module()? >> >> It replaces create_module. >> >> >> The signature would be something like: >> >> >> >> def prepare_module(reloading=None): >> >> """Create a module object for execution. Returning None will >> >> created a default module. >> >> Oops, stuffed up the signature. First arg should be the module spec: >> >> def prepare_module(spec, reloading=None): >> ... >> >> > I can't follow that sentence. =) What does returning None represent? >> >> Returning None indicates that the *loader* defines a module creation >> API, but the particular module being loaded doesn't take advantage of >> it. > > > IOW returning None means "I don't have anything special to say here, so do what you want"? > >> >> >> It's a feature I need for the new extension module loader API, where >> the creation hook allows the extension module to build a completely >> custom object (perhaps with additional state). You can request an >> ordinary module just by not defining the creation hook, and only >> defining the execution hook (which accepts an already created module). > > > OK, so this is purely for special-cases and not meant to always return something, just return something when needed. > >> >> >> By switching to a *preparation* hook, rather than creation, I think we >> can make this play more nicely with reloading. In the reloading case, >> the preparation hook would be responsible for checking that the >> existing object was a suitable execution target. > > > Ah, OK. It's more of a pre-condition check in that case, otherwise it's a chance to say "use this rather than whatever you default to". > >> >> >> >> If *reloading* is set, specifies an existing sys.modules entry >> >> that is being reloaded. >> > >> > As in the key into sys.modules? >> >> No, as in the object itself. Technically it doesn't *have* to be in >> sys.modules, and the loader really shouldn't care if it is or not. > > > That's what I figured. > >> >> >> >> Must return None or that >> >> specific object if reloading is supported. >> > >> > >> > What's "that" supposed to represent? >> >> s/that specific object/the passed in object/ >> >> >> Returning a >> >> different module object or explicitly raising ImportError >> >> indicates that reloading is not supported. (Or perhaps define >> >> a "ReloadError" subclass of ImportError?) >> >> """ >> > >> > >> > I'm really not following what this method is supposed to do. Is it simply >> > mucking with sys.modules? Is it creating a module to use? If it's the latter >> > then how does return None do anything? Are you saying returning None means >> > "I didn't do anything special, do what you want"? >> >> It replace create_module with something that can also serve as the >> pre-check for the reloading case. > > > In ModuleSpec.load(): > > module = self.loader.prepare_module(self) > if module is None: > module = types.ModuleType(self.name) > > And in reload(): > > module = self.loader.prepare_module(self, module_being_reloaded) > > That way some custom object can be used, and in the reload case ImportError can just propagate up if it turns out the module can't be reloaded. Yep, that's exactly what I had in mind, although reload would also have an extra check to ensure the module returned was the same as the one passed in (that way, in-place reloading support for custom loaders that define create_module would always be opt-in rather than opt-out). Talking to Stefan about making this work on the extension module API side has confirmed my belief that this is the way to go, since it also deals nicely with placing custom objects in sys.modules. The one downside is that it means preconditions will be checked twice in the reload case (once in prepare, once in exec), but I can live with that for the likely reliability gains in the reloading API. If it works as well as I hope, I may finally be comfortable with proposing "imp.reload_fresh" for 3.5 :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sun Sep 1 07:36:12 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 31 Aug 2013 23:36:12 -0600 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Sat, Aug 31, 2013 at 7:53 PM, Nick Coghlan wrote: > Yep, that's exactly what I had in mind, although reload would also have an > extra check to ensure the module returned was the same as the one passed in > (that way, in-place reloading support for custom loaders that define > create_module would always be opt-in rather than opt-out). > > Talking to Stefan about making this work on the extension module API side > has confirmed my belief that this is the way to go, since it also deals > nicely with placing custom objects in sys.modules. > > The one downside is that it means preconditions will be checked twice in > the reload case (once in prepare, once in exec), but I can live with that > for the likely reliability gains in the reloading API. > > If it works as well as I hope, I may finally be comfortable with proposing > "imp.reload_fresh" for 3.5 :) > > Cheers, > Nick. > I haven't had a chance to respond to a few comments in detail yet, nor have I been able to read through the extension module API threads, but what I have seen has gotten me thinking about what exactly matters here regarding preparing and executing modules. * There are two kinds of module state: internal (not exposed in Python) and external (module.__dict__). * External state is associated with a module object, but internal state is not--it may be associated with a module name, a location (locatable resource), or something else. * Internal state (if it exists) is not necessarily created at execution (load/reload) time and may be shared between modules. * The internal module state may or may not be managed by the interpreter (regardless of Python implementation). * External module state is established at import execution time (load or reload). * Loading puts the external state into a new module and reloading into an existing one (likely overwriting at least some contents). * During module execution (during load/reload), external module state is copied from internal state, dynamically generated (e.g. .py files), or a mix of both. * Dynamic external state generation is only allowed once for some modules. * Dynamic generation is not necessarily (but sometimes is) an idempotent operation. * Dynamic generation may be associated with a locatable resource. * Non-locatable sources are not necessarily unchanging. * Loaders are in charge of managing module execution and may be involved with managing internal state. The life-cycle of module state, both internal and external, is pretty congruent with objects in Python: 1. create 2. init 3. modify 4. destroy Modules have some special cases that fit in there: 2a. dynamically generate 2b. populate from another namespace 3a/2b. reset * In the generation case, population may happen simultaneously (e.g. .py files). * Resetting a module's state may not be the same operation as init. Am I missing anything in all of this? --- Some questions that come to mind: * Should loaders cover all the permutations of module state and it lifecycle? Our proposed APIs are moving in that direction. Are they enough? * When does internal state get generated and how is it managed? Should loaders be the official liaison for the import system? Python implementation extension module APIs cover this somewhat (particularly for CPython). * Should the language provide a non-implementation-specific API for associating internal APIs with modules? (PEP 3121-ish) * Does reset deserve its own explicit API? * How do you keep init from happening more than once? IOW, what happens when ModuleSpec.create() is called more than once? I have more questions but they mostly line up with the details above. Anyway, there are the things I am mulling over. For PEP 451 I'm not going to try to accomplish module API perfection, but I do want to make sure we're on the right track with a more explicit perspective. The confusion with create_module() and exec_module() made it clear to me that the picture should be more clear before we start -eric p.s. Tangential shot in the dark idea: Optionally cache deep copy of external state for generate-only-once case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Sep 1 14:47:56 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Sep 2013 22:47:56 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 1 September 2013 15:36, Eric Snow wrote: > On Sat, Aug 31, 2013 at 7:53 PM, Nick Coghlan wrote: >> >> Yep, that's exactly what I had in mind, although reload would also have an >> extra check to ensure the module returned was the same as the one passed in >> (that way, in-place reloading support for custom loaders that define >> create_module would always be opt-in rather than opt-out). >> >> Talking to Stefan about making this work on the extension module API side >> has confirmed my belief that this is the way to go, since it also deals >> nicely with placing custom objects in sys.modules. >> >> The one downside is that it means preconditions will be checked twice in >> the reload case (once in prepare, once in exec), but I can live with that >> for the likely reliability gains in the reloading API. >> >> If it works as well as I hope, I may finally be comfortable with proposing >> "imp.reload_fresh" for 3.5 :) >> >> Cheers, >> Nick. > > I haven't had a chance to respond to a few comments in detail yet, nor have > I been able to read through the extension module API threads, but what I > have seen has gotten me thinking about what exactly matters here regarding > preparing and executing modules. > > * There are two kinds of module state: internal (not exposed in Python) and > external (module.__dict__). > * External state is associated with a module object, but internal state is > not--it may be associated with a module name, a location (locatable > resource), or something else. > > * Internal state (if it exists) is not necessarily created at execution > (load/reload) time and may be shared between modules. > * The internal module state may or may not be managed by the interpreter > (regardless of Python implementation). > > * External module state is established at import execution time (load or > reload). > * Loading puts the external state into a new module and reloading into an > existing one (likely overwriting at least some contents). > * During module execution (during load/reload), external module state is > copied from internal state, dynamically generated (e.g. .py files), or a mix > of both. > * Dynamic external state generation is only allowed once for some modules. > * Dynamic generation is not necessarily (but sometimes is) an idempotent > operation. > * Dynamic generation may be associated with a locatable resource. > * Non-locatable sources are not necessarily unchanging. > > * Loaders are in charge of managing module execution and may be involved > with managing internal state. > > The life-cycle of module state, both internal and external, is pretty > congruent with objects in Python: > > 1. create > 2. init > 3. modify > 4. destroy I'd tweak this slightly, and say that modules are more congruent with *class namespaces* than they are with ordinary objects (which is why I chose "prepare" as a suggested alternative to "create"). The main difference is that we don't support reinitialising a class namespace in place, while we do support doing so for modules. That makes the lifecycle: 1. prepare 2. exec 3. modify 4. destroy > Modules have some special cases that fit in there: > > 2a. dynamically generate > 2b. populate from another namespace > 3a/2b. reset > > * In the generation case, population may happen simultaneously (e.g. .py > files). I don't quite understand what you mean by "generate" here, unless you mean the fact that "init" for a module usually involves running arbitrary user provided code in that namespace. If so, well that's why I think "exec" is a better name for it than "init" :) > * Resetting a module's state may not be the same operation as init. > > Am I missing anything in all of this? > > --- > > Some questions that come to mind: > > * Should loaders cover all the permutations of module state and it > lifecycle? Our proposed APIs are moving in that direction. Are they > enough? > * When does internal state get generated and how is it managed? Should > loaders be the official liaison for the import system? Python > implementation extension module APIs cover this somewhat (particularly for > CPython). Take a look at the discussion between Stefan Behnel and I on python-dev. The loaders should really only care about the Python visible state. PEP 3121 was a useful evolution of the extension module design, but ultimately failed in its aims by adding an additional kind of hidden state, rather than using the existing mechanisms for adding hidden state to extension types and instances. The result is that both Stefan and I now agree that references to hidden state from extension modules should be maintained directly on the objects exposed as the module's externally visible state. This includes exposing already bound instance methods of a hidden state object rather than ordinary functions for any top level callables, as well as including a reference to the hidden state in custom type definitions (which they may then optionally transfer to instances for fewer indirections when accessing the hidden state, at the cost of an extra pointer per instance). The advantage of this approach is that it avoids needing a custom mechanism to allow the module to get access to its hidden state - instead, all modules, including extension modules, are expected to ensure that module level APIs always have direct access to any internal state they need, rather than relying on C static variables or a hidden storage area like that provided by PEP 3121. > * Should the language provide a non-implementation-specific API for > associating internal APIs with modules? (PEP 3121-ish) No. > * Does reset deserve its own explicit API? How does reset differ from reload? > * How do you keep init from happening more than once? IOW, what happens > when ModuleSpec.create() is called more than once? create() should either be idempotent, use the PEP 3121 APIs to implicitly return the same object, or else throw an error if it detects it has already been initialised. This is up to the loader, though, rather than being the responsibility of the import system (although we should document the three options). > I have more questions but they mostly line up with the details above. > > Anyway, there are the things I am mulling over. For PEP 451 I'm not going > to try to accomplish module API perfection, but I do want to make sure we're > on the right track with a more explicit perspective. The confusion with > create_module() and exec_module() made it clear to me that the picture > should be more clear before we start I definitely recommend the thread on python-dev :) The messages from the last couple of days are probable enough (start with http://mail.python.org/pipermail/python-dev/2013-September/128244.html), but if you want more context that thread actually started in August. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Mon Sep 2 10:32:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Sep 2013 10:32:10 +0200 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 References: Message-ID: <20130902103210.344d5326@pitrou.net> Le Sun, 1 Sep 2013 22:47:56 +1000, Nick Coghlan a ?crit : > > The life-cycle of module state, both internal and external, is > > pretty congruent with objects in Python: > > > > 1. create > > 2. init > > 3. modify > > 4. destroy > > I'd tweak this slightly, and say that modules are more congruent with > *class namespaces* than they are with ordinary objects (which is why I > chose "prepare" as a suggested alternative to "create"). The main > difference is that we don't support reinitialising a class namespace > in place, while we do support doing so for modules. Given that modules are already instances of the module type, and given most Python users are much more familiar with the semantics of object namespaces, rather than those of class namespaces, I'd strongly rather have modules stay instance-alike, rather than type-alike. (haven't read the latest PEP update, though) > This > includes exposing already bound instance methods of a hidden state > object rather than ordinary functions for any top level callables, Er... There may (or even will) be compatibility issues with that. Such as pickling of top level functions, or the various discrepancies of bound methods vs. plain functions. Or, of course, all the changes in introspection results that might disrupt existing code (think the various `inspect` functions). I think it would be much safer to have top level functions remain plain functions, and take the module object as first argument as they already do (the "PyObject *self"). And it would help extension modules be more like normal Python modules. Regards Antoine. From ncoghlan at gmail.com Mon Sep 2 15:09:18 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Sep 2013 23:09:18 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: <20130902103210.344d5326@pitrou.net> References: <20130902103210.344d5326@pitrou.net> Message-ID: On 2 September 2013 18:32, Antoine Pitrou wrote: > Le Sun, 1 Sep 2013 22:47:56 +1000, > Nick Coghlan a ?crit : >> > The life-cycle of module state, both internal and external, is >> > pretty congruent with objects in Python: >> > >> > 1. create >> > 2. init >> > 3. modify >> > 4. destroy >> >> I'd tweak this slightly, and say that modules are more congruent with >> *class namespaces* than they are with ordinary objects (which is why I >> chose "prepare" as a suggested alternative to "create"). The main >> difference is that we don't support reinitialising a class namespace >> in place, while we do support doing so for modules. > > Given that modules are already instances of the module type, and given > most Python users are much more familiar with the semantics of object > namespaces, rather than those of class namespaces, I'd strongly rather > have modules stay instance-alike, rather than type-alike. They're not *that* much like either at runtime, since the descriptor machinery is deliberately turned off for module instances. The data model during execution is the same as that of a class body, though. It's just that module level corresponds to exec when globals and locals refer to the same namespace, while class bodies use the module globals and their own locals. That's the part that makes me see the parallel with prepare/exec for type instances as more significant than that with new/init for normal class instances. > (haven't read the latest PEP update, though) > >> This >> includes exposing already bound instance methods of a hidden state >> object rather than ordinary functions for any top level callables, > > Er... There may (or even will) be compatibility issues with that. Such > as pickling of top level functions, or the various discrepancies of > bound methods vs. plain functions. Or, of course, all the changes in > introspection results that might disrupt existing code (think > the various `inspect` functions). > > I think it would be much safer to have top level functions remain > plain functions, and take the module object as first argument as they > already do (the "PyObject *self"). And it would help extension modules > be more like normal Python modules. Yeah, I made that suggestion when I was confused between which of PyModule_GetState and PyState_GetModule caused problems. Since it's only the latter, there's no need to muck about with exposing bound methods of objects with hidden state - we already have that by calling "PyModule_GetState" on the module parameter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Mon Sep 2 15:15:20 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Sep 2013 15:15:20 +0200 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 References: <20130902103210.344d5326@pitrou.net> Message-ID: <20130902151520.7e61422f@pitrou.net> Le Mon, 2 Sep 2013 23:09:18 +1000, Nick Coghlan a ?crit : > > Er... There may (or even will) be compatibility issues with that. > > Such as pickling of top level functions, or the various > > discrepancies of bound methods vs. plain functions. Or, of course, > > all the changes in introspection results that might disrupt > > existing code (think the various `inspect` functions). > > > > I think it would be much safer to have top level functions remain > > plain functions, and take the module object as first argument as > > they already do (the "PyObject *self"). And it would help extension > > modules be more like normal Python modules. > > Yeah, I made that suggestion when I was confused between which of > PyModule_GetState and PyState_GetModule caused problems. Since it's > only the latter, there's no need to muck about with exposing bound > methods of objects with hidden state - we already have that by calling > "PyModule_GetState" on the module parameter. But even PyModule_GetState isn't necessary anymore, if module objects can have custom C fields. Regards Antoine. From ncoghlan at gmail.com Mon Sep 2 15:53:44 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Sep 2013 23:53:44 +1000 Subject: [Import-SIG] Thoughts on cleaner reloading support Message-ID: The extension module discussion on python-dev got me thinking about the different ways in which the "singleton" assumption for modules can be broken, and how to ensure that extension modules play nicely in that environment. As I see it, there are 4 ways the "singleton that survives for the lifetime of the process following initial import" assumption regarding modules can turn out to be wrong: 1. In-place reload, overwriting the existing contents of a namespace. imp.reload() does this. We sort of do it for __main__, except we usually keep re-using that namespace to run *different* things, rather than rerunning the same code. 2. Parallel loading. We remove the existing module from sys.modules (keeping a reference to it alive), and load a second copy. Alternatively, we call the loader APIs directly. Either way, we end up with two independent copies of the "same" module, potentially reflecting difference system states at the time of execution. 3. Subinterpreter support. Quite similar to parallel loading, but we're loading the second copy because we're in a subinterpreter and can't see the original. 4. Unloading. We remove the existing module from sys.modules and drop all other references to it. The module gets destroyed, and we later import a completely fresh copy. Even pure Python modules may not support these, since they may have side effects, or assume they're in the main interpreter, or other things. Currently, there is no way to signal this to the import system, so we're left with implicit misbehaviour when we attempt to reload the modules with global side effects. For a while, I was thinking we could design the import system to "just figure it out", but now I'm thinking a selection of read/write properties on spec objects may make more sense: allow_reload allow_unload allow_reimport allow_subinterpreter_import These would all default to True, but loaders and modules could selectively turn them off. They would also be advisory rather than enforced via all possible import state manipulation mechanisms. New functions in importlib.util could provide easier alternatives to directly manipulating sys.modules: - importlib.util.reload (replacement for imp.reload that checks the spec allows reloading) - importlib.util.unload (replacement for "del sys.modules[module.__name__]" that checks the spec allows unloading, and also unloads all child modules) - importlib.util.reimport (replacement for test.support.import_fresh_module that checks the spec of any existing sys.module entry allows reimporting a parallel copy) One of these is not like the others... aside from the existing extension module specific mechanism defined in PEP 3121, I'm not sure we can devise a general *loader* level API to force imports for a particular name to fail in a subinterpreter. So this concern probably needs to be ignored in favour of a possible future C API level solution. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Sep 2 15:57:22 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Sep 2013 23:57:22 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: <20130902151520.7e61422f@pitrou.net> References: <20130902103210.344d5326@pitrou.net> <20130902151520.7e61422f@pitrou.net> Message-ID: On 2 September 2013 23:15, Antoine Pitrou wrote: > Le Mon, 2 Sep 2013 23:09:18 +1000, > Nick Coghlan a ?crit : >> > Er... There may (or even will) be compatibility issues with that. >> > Such as pickling of top level functions, or the various >> > discrepancies of bound methods vs. plain functions. Or, of course, >> > all the changes in introspection results that might disrupt >> > existing code (think the various `inspect` functions). >> > >> > I think it would be much safer to have top level functions remain >> > plain functions, and take the module object as first argument as >> > they already do (the "PyObject *self"). And it would help extension >> > modules be more like normal Python modules. >> >> Yeah, I made that suggestion when I was confused between which of >> PyModule_GetState and PyState_GetModule caused problems. Since it's >> only the latter, there's no need to muck about with exposing bound >> methods of objects with hidden state - we already have that by calling >> "PyModule_GetState" on the module parameter. > > But even PyModule_GetState isn't necessary anymore, if module objects > can have custom C fields. You're giving me too much credit here - I *really* wasn't thinking clearly about the problem, and had managed to forget that module level functions still receive the module as their first argument, so they can already get access to custom fields if the module is an instance of a custom type. Just a mindset problem caused by the fact we'd started the discussion based on the idea of returning an instance of a completely custom type, but it still lead to some remarkably erroneous suggestions :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From andrew.svetlov at gmail.com Mon Sep 2 16:30:17 2013 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 2 Sep 2013 17:30:17 +0300 Subject: [Import-SIG] Thoughts on cleaner reloading support In-Reply-To: References: Message-ID: On Mon, Sep 2, 2013 at 4:53 PM, Nick Coghlan wrote: > The extension module discussion on python-dev got me thinking about > the different ways in which the "singleton" assumption for modules can > be broken, and how to ensure that extension modules play nicely in > that environment. > > As I see it, there are 4 ways the "singleton that survives for the > lifetime of the process following initial import" assumption regarding > modules can turn out to be wrong: > > 1. In-place reload, overwriting the existing contents of a namespace. > imp.reload() does this. We sort of do it for __main__, except we > usually keep re-using that namespace to run *different* things, rather > than rerunning the same code. > > 2. Parallel loading. We remove the existing module from sys.modules > (keeping a reference to it alive), and load a second copy. > Alternatively, we call the loader APIs directly. Either way, we end up > with two independent copies of the "same" module, potentially > reflecting difference system states at the time of execution. > > 3. Subinterpreter support. Quite similar to parallel loading, but > we're loading the second copy because we're in a subinterpreter and > can't see the original. > > 4. Unloading. We remove the existing module from sys.modules and drop > all other references to it. The module gets destroyed, and we later > import a completely fresh copy. > > Even pure Python modules may not support these, since they may have > side effects, or assume they're in the main interpreter, or other > things. Currently, there is no way to signal this to the import > system, so we're left with implicit misbehaviour when we attempt to > reload the modules with global side effects. > > For a while, I was thinking we could design the import system to "just > figure it out", but now I'm thinking a selection of read/write > properties on spec objects may make more sense: > > allow_reload > allow_unload > allow_reimport > allow_subinterpreter_import > > These would all default to True, but loaders and modules could > selectively turn them off. > > They would also be advisory rather than enforced via all possible > import state manipulation mechanisms. New functions in importlib.util > could provide easier alternatives to directly manipulating > sys.modules: > > - importlib.util.reload (replacement for imp.reload that checks the > spec allows reloading) > - importlib.util.unload (replacement for "del > sys.modules[module.__name__]" that checks the spec allows unloading, > and also unloads all child modules) What do you mean by child modules? > - importlib.util.reimport (replacement for > test.support.import_fresh_module that checks the spec of any existing > sys.module entry allows reimporting a parallel copy) > > One of these is not like the others... aside from the existing > extension module specific mechanism defined in PEP 3121, I'm not sure > we can devise a general *loader* level API to force imports for a > particular name to fail in a subinterpreter. So this concern probably > needs to be ignored in favour of a possible future C API level > solution. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig -- Thanks, Andrew Svetlov From ericsnowcurrently at gmail.com Wed Sep 18 10:14:35 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 02:14:35 -0600 Subject: [Import-SIG] Thoughts on cleaner reloading support In-Reply-To: References: Message-ID: On Mon, Sep 2, 2013 at 7:53 AM, Nick Coghlan wrote: > The extension module discussion on python-dev got me thinking about > the different ways in which the "singleton" assumption for modules can > be broken, and how to ensure that extension modules play nicely in > that environment. > > As I see it, there are 4 ways the "singleton that survives for the > lifetime of the process following initial import" assumption regarding > modules can turn out to be wrong: > > 1. In-place reload, overwriting the existing contents of a namespace. > imp.reload() does this. We sort of do it for __main__, except we > usually keep re-using that namespace to run *different* things, rather > than rerunning the same code. > > 2. Parallel loading. We remove the existing module from sys.modules > (keeping a reference to it alive), and load a second copy. > Alternatively, we call the loader APIs directly. Either way, we end up > with two independent copies of the "same" module, potentially > reflecting difference system states at the time of execution. > > 3. Subinterpreter support. Quite similar to parallel loading, but > we're loading the second copy because we're in a subinterpreter and > can't see the original. > > 4. Unloading. We remove the existing module from sys.modules and drop > all other references to it. The module gets destroyed, and we later > import a completely fresh copy. > > Even pure Python modules may not support these, since they may have > side effects, or assume they're in the main interpreter, or other > things. Currently, there is no way to signal this to the import > system, so we're left with implicit misbehaviour when we attempt to > reload the modules with global side effects. > > For a while, I was thinking we could design the import system to "just > figure it out", but now I'm thinking a selection of read/write > properties on spec objects may make more sense: > > allow_reload > allow_unload > allow_reimport > allow_subinterpreter_import > > These would all default to True, but loaders and modules could > selectively turn them off. > > They would also be advisory rather than enforced via all possible > import state manipulation mechanisms. New functions in importlib.util > could provide easier alternatives to directly manipulating > sys.modules: > > - importlib.util.reload (replacement for imp.reload that checks the > spec allows reloading) > - importlib.util.unload (replacement for "del > sys.modules[module.__name__]" that checks the spec allows unloading, > and also unloads all child modules) > - importlib.util.reimport (replacement for > test.support.import_fresh_module that checks the spec of any existing > sys.module entry allows reimporting a parallel copy) > > One of these is not like the others... aside from the existing > extension module specific mechanism defined in PEP 3121, I'm not sure > we can devise a general *loader* level API to force imports for a > particular name to fail in a subinterpreter. So this concern probably > needs to be ignored in favour of a possible future C API level > solution. Interesting stuff. While I think this is big enough to be tackled separately from PEP 451, I'll add a note there. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Sep 18 11:51:22 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 03:51:22 -0600 Subject: [Import-SIG] PEP 451: Big update. Message-ID: Hi all, I finally got some time to update the PEP. I've simplified a few things, most notably by making the 4 ModuleSpec methods (create, exec, load, reload) "private". Also notable is that the new loader method is still create_module() and there is still no flag for is_reload on either of the loader methods. I'm still not clear on what the flag buys us and on why anything we'd do in a prepare_module() we couldn't do in exec_module(). I'm trying to keep this simple. :) Anyway, I still need to take some time to clean up the PEP formatting and run a spell checker. I probably also missed some artifact of an older version of the API. Otherwise I think it's in a good spot. Comments welcome. -eric p.s. I also plan on getting the implementation up one of these days. :P =============================================================== PEP: 451 Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013 Resolution: Abstract ======== This PEP proposes to add a new class to ``importlib.machinery`` called ``ModuleSpec``. It will be authoritative for all the import-related information about a module, and will be available without needing to load the module first. Finders will directly provide a module's spec instead of a loader (which they will continue to provide indirectly). The import machinery will be adjusted to take advantage of module specs, including using them to load modules. Motivation ========== The import system has evolved over the lifetime of Python. In late 2002 PEP 302 introduced standardized import hooks via ``finders`` and ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibilty also presents a challenge. As more developers come to understand and customize the import system, any weaknesses in the finder and loader APIs will be more impactful. So the sooner we can address any such weaknesses the import system, the better...and there are a couple we can take care of with this proposal. Firstly, any time the import system needs to save information about a module we end up with more attributes on module objects that are generally only meaningful to the import system. It would be nice to have a per-module namespace in which to put future import-related information and to pass around within the import system. Secondly, there's an API void between finders and loaders that causes undue complexity when encountered. Currently finders are strictly responsible for providing the loader, through their find_module() method, which the import system will use to load the module. The loader is then responsible for doing some checks, creating the module object, setting import-related attributes, "installing" the module to ``sys.modules``, and loading the module, along with some cleanup. This all takes place during the import system's call to ``Loader.load_module()``. Loaders also provide some APIs for accessing data associated with a module. Loaders are not required to provide any of the functionality of ``load_module()`` through other methods. Thus, though the import- related information about a module is likely available without loading the module, it is not otherwise exposed. Furthermore, the requirements assocated with ``load_module()`` are common to all loaders and mostly are implemented in exactly the same way. This means every loader has to duplicate the same boilerplate code. ``importlib.util`` provides some tools that help with this, but it would be more helpful if the import system simply took charge of these responsibilities. The trouble is that this would limit the degree of customization that ``load_module()`` facilitates. This is a gap between finders and loaders which this proposal aims to fill. Finally, when the import system calls a finder's ``find_module()``, the finder makes use of a variety of information about the module that is useful outside the context of the method. Currently the options are limited for persisting that per-module information past the method call, since it only returns the loader. Popular options for this limitation are to store the information in a module-to-info mapping somewhere on the finder itself, or store it on the loader. Unfortunately, loaders are not required to be module-specific. On top of that, some of the useful information finders could provide is common to all finders, so ideally the import system could take care of those details. This is the same gap as before between finders and loaders. As an example of complexity attributable to this flaw, the implementation of namespace packages in Python 3.3 (see PEP 420) added ``FileFinder.find_loader()`` because there was no good way for ``find_module()`` to provide the namespace search locations. The answer to this gap is a ``ModuleSpec`` object that contains the per-module information and takes care of the boilerplate functionality involved with loading the module. (The idea gained momentum during discussions related to another PEP.[1]) Specification ============= The goal is to address the gap between finders and loaders while changing as little of their semantics as possible. Though some functionality and information is moved to the new ``ModuleSpec`` type, their behavior should remain the same. However, for the sake of clarity the finder and loader semantics will be explicitly identified. This is a high-level summary of the changes described by this PEP. More detail is available in later sections. importlib.machinery.ModuleSpec (new) ------------------------------------ A specification for a module's import-system-related state. * ModuleSpec(name, loader, \*, origin=None, loading_info=None, is_package=None) Attributes: * name - a string for the name of the module. * loader - the loader to use for loading and for module data. * origin - a string for the location from which the module is loaded, e.g. "builtin" for built-in modules and the filename for modules loaded from source. * submodule_search_locations - strings for where to find submodules, if a package. * loading_info - a container of extra data for use during loading. * cached (property) - a string for where the compiled module will be stored (see PEP 3147). * package (RO-property) - the name of the module's parent (or None). * has_location (RO-property) - the module's origin refers to a location. Instance Methods: * module_repr() - provide a repr string for the spec'ed module. * init_module_attrs(module) - set any of a module's import-related attributes that aren't already set. importlib.util Additions ------------------------ * spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None) - factory for file-based module specs. * from_loader(name, loader, \*, origin=None, is_package=None) - factory based on information provided by loaders. * spec_from_module(module, loader=None) - factory based on existing import-related module attributes. This function is expected to be used only in some backward-compatibility situations. Other API Additions ------------------- * importlib.abc.Loader.exec_module(module) will execute a module in its own namespace. It replaces ``importlib.abc.Loader.load_module()``. * importlib.abc.Loader.create_module(spec) (optional) will return a new module to use for loading. * Module objects will have a new attribute: ``__spec__``. * importlib.find_spec(name, path=None) will return the spec for a module. exec_module() and create_module() should not set any import-related module attributes. The fact that load_module() does is a design flaw that this proposal aims to correct. API Changes ----------- * ``InspectLoader.is_package()`` will become optional. Deprecations ------------ * importlib.abc.MetaPathFinder.find_module() * importlib.abc.PathEntryFinder.find_module() * importlib.abc.PathEntryFinder.find_loader() * importlib.abc.Loader.load_module() * importlib.abc.Loader.module_repr() * The parameters and attributes of the various loaders in importlib.machinery * importlib.util.set_package() * importlib.util.set_loader() * importlib.find_loader() Removals -------- These were introduced prior to Python 3.4's release. * importlib.abc.Loader.init_module_attrs() * importlib.util.module_to_load() Other Changes ------------- * The import system implementation in importlib will be changed to make use of ModuleSpec. * Import-related module attributes (other than ``__spec__``) will no longer be used directly by the import system. * Import-related attributes should no longer be added to modules directly. * The module type's ``__repr__()`` will be thin wrapper around a pure Python implementation which will leverage ModuleSpec. * The spec for the ``__main__`` module will reflect the appropriate name and origin. Backward-Compatibility ---------------------- * If a finder does not define find_spec(), a spec is derived from the loader returned by find_module(). * PathEntryFinder.find_loader() still takes priority over find_module(). * Loader.load_module() is used if exec_module() is not defined. What Will not Change? --------------------- * The syntax and semantics of the import statement. * Existing finders and loaders will continue to work normally. * The import-related module attributes will still be initialized with the same information. * Finders will still create loaders (now storing them in specs). * Loader.load_module(), if a module defines it, will have all the same requirements and may still be called directly. * Loaders will still be responsible for module data APIs. * importlib.reload() will still overwrite the import-related attributes. What Will Existing Finders and Loaders Have to Do Differently? ============================================================== Immediately? Nothing. The status quo will be deprecated, but will continue working. However, here are the things that the authors of finders and loaders should change relative to this PEP: * Implement ``find_spec()`` on finders. * Implement ``exec_module()`` on loaders, if possible. The ModuleSpec factory functions in importlib.util are intended to be helpful for converting existing finders. ``from_loader()`` and ``from_file_location()`` are both straight-forward utilities in this regard. In the case where loaders already expose methods for creating and preparing modules, ``ModuleSpec.from_module()`` may be useful to the corresponding finder. For existing loaders, exec_module() should be a relatively direct conversion from the non-boilerplate portion of load_module(). In some uncommon cases the loader should also implement create_module(). ModuleSpec Users ================ ``ModuleSpec`` objects has 3 distinct target audiences: Python itself, import hooks, and normal Python users. Python will use specs in the import machinery, in interpreter startup, and in various standard library modules. Some modules are import-oriented, like pkgutil, and others are not, like pickle and pydoc. In all cases, the full ``ModuleSpec`` API will get used. Import hooks (finders and loaders) will make use of the spec in specific ways. First of all, finders may use the spec factory functions in importlib.util to create spec objects. They may also directly adjust the spec attributes after the spec is created. Secondly, the finder may bind additional information to the spec (in finder_extras) for the loader to consume during module creation/execution. Finally, loaders will make use of the attributes on a spec when creating and/or executing a module. Python users will be able to inspect a module's ``__spec__`` to get import-related information about the object. Generally, Python applications and interactive users will not be using the ``ModuleSpec`` factory functions nor any the instance methods. How Loading Will Work ===================== This is an outline of what happens in ModuleSpec's loading functionality:: def load(spec): if not hasattr(spec.loader, 'exec_module'): module = spec.loader.load_module(spec.name) spec.init_module_attrs(module) return sys.modules[spec.name] module = None if hasattr(spec.loader, 'create_module'): module = spec.loader.create_module(spec) if module is None: module = ModuleType(spec.name) spec.init_module_attrs(module) spec._initializing = True sys.modues[spec.name] = module try: spec.loader.exec_module(module) except Exception: del sys.modules[spec.name] finally: spec._initializing = False return sys.modules[spec.name] These steps are exactly what ``Loader.load_module()`` is already expected to do. Loaders will thus be simplified since they will only need to implement exec_module(). Note that we must return the module from sys.modules. During loading the module may have replaced itself in sys.modules. Since we don't have a post-import hook API to accommodate the use case, we have to deal with it. However, in the replacement case we do not worry about setting the import-related module attributes on the object. The module writer is on their own if they are doing this. ModuleSpec ========== Attributes ---------- Each of the following names is an attribute on ModuleSpec objects. A value of ``None`` indicates "not set". This contrasts with module objects where the attribute simply doesn't exist. Most of the attributes correspond to the import-related attributes of modules. Here is the mapping. The reverse of this mapping is used by ModuleSpec.init_module_attrs(). ========================== ============== On ModuleSpec On Modules ========================== ============== name __name__ loader __loader__ package __package__ origin __file__* cached __cached__*,** submodule_search_locations __path__** loading_info \- has_location \- ========================== ============== \* Set only if has_location is true. \*\* Set only if the spec attribute is not None. While package and has_location are read-only properties, the remaining attributes can be replaced after the module spec is created and even after import is complete. This allows for unusual cases where directly modifying the spec is the best option. However, typical use should not involve changing the state of a module's spec. **origin** origin is a string for the place from which the module originates. Aside from the informational value, it is also used in module_repr(). The module attribute ``__file__`` has a similar but more restricted meaning. Not all modules have it set (e.g. built-in modules). However, ``origin`` is applicable to all modules. For built-in modules it would be set to "built-in". **has_location** Some modules can be loaded by reference to a location, e.g. a filesystem path or a URL or something of the sort. Having the location lets you load the module, but in theory you could load that module under various names. In contrast, non-located modules can't be loaded in this fashion, e.g. builtin modules and modules dynamically created in code. For these, the name is the only way to access them, so they have an "origin" but not a "location". This attribute reflects whether or not the module is locatable. If it is, origin must be set to the module's location and ``__file__`` will be set on the module. Not all locatable modules will be cachable, but most will. The corresponding module attribute name, ``__file__``, is somewhat inaccurate and potentially confusion, so we will use a more explicit combination of origin and has_location to represent the same information. Having a separate filename is unncessary since we have origin. **submodule_search_locations** The list of location strings, typically directory paths, in which to search for submodules. If the module is a package this will be set to a list (even an empty one). Otherwise it is ``None``. The corresponding module attribute's name, ``__path__``, is relatively ambiguous. Instead of mirroring it, we use a more explicit name that makes the purpose clear. **loading_info** A finder may set loading_info to any value to provide additional data for the loader to use during loading. A value of None is the default and indicates that there is no additional data. Otherwise it can be set to any object, such as a dict, list, or types.SimpleNamespace, containing the relevant extra information. For example, zipimporter could use it to pass the zip archive name to the loader directly, rather than needing to derive it from origin or create a custom loader for each find operation. loading_info is meant for use by the finder and corresponding loader. It is not guaranteed to be a stable resource for any other use. Omitted Attributes and Methods ------------------------------ The following ModuleSpec methods are not part of the public API since it is easy to use them incorrectly and only the import system really needs them (i.e. they would be an attractive nuisance). * create() - provide a new module to use for loading. * exec(module) - execute the spec into a module namespace. * load() - prepare a module and execute it in a protected way. * reload(module) - re-execute a module in a protected way. Here are other omissions: There is no PathModuleSpec subclass of ModuleSpec that separates out has_location, cached, and submodule_search_locations. While that might make the separation cleaner, module objects don't have that distinction. ModuleSpec will support both cases equally well. While is_package would be a simple additional attribute (aliasing ``self.submodule_search_locations is not None``), it perpetuates the artificial (and mostly erroneous) distinction between modules and packages. Conceivably, a ModuleSpec.load() method could optionally take a list of modules with which to interact instead of sys.modules. That capability is left out of this PEP, but may be pursued separately at some other time, including relative to PEP 406 (import engine). Likewise load() could be leveraged to implement multi-version imports. While interesting, doing so is outside the scope of this proposal. Others: * Add ModuleSpec.submodules (RO-property) - returns possible submodules relative to the spec. * Add ModuleSpec.loaded (RO-property) - the module in sys.module, if any. * Add ModuleSpec.data - a descriptor that wraps the data API of the spec's loader. * Also see [3]. Backward Compatibility ---------------------- ModuleSpec doesn't have any. This would be a different story if Finder.find_module() were to return a module spec instead of loader. In that case, specs would have to act like the loader that would have been returned instead. Doing so would be relatively simple, but is an unnecessary complication. It was part of earlier versions of this PEP. Subclassing ----------- Subclasses of ModuleSpec are allowed, but should not be necessary. Simply setting loading_info or adding functionality to a custom finder or loader will likely be a better fit and should be tried first. However, as long as a subclass still fulfills the requirements of the import system, objects of that type are completely fine as the return value of Finder.find_spec(). Existing Types ============== Module Objects -------------- Other than adding ``__spec__``, none of the import-related module attributes will be changed or deprecated, though some of them could be; any such deprecation can wait until Python 4. A module's spec will not be kept in sync with the corresponding import- related attributes. Though they may differ, in practice they will typically be the same. One notable exception is that case where a module is run as a script by using the ``-m`` flag. In that case ``module.__spec__.name`` will reflect the actual module name while ``module.__name__`` will be ``__main__``. Notably, the spec for each module instance will be unique to that instance even if the information is identical to that of another spec. This won't happen in general. Finders ------- Finders are still responsible for creating the loader. That loader will now be stored in the module spec returned by ``find_spec()`` rather than returned directly. As is currently the case without the PEP, if a loader would be costly to create, that loader can be designed to defer the cost until later. **MetaPathFinder.find_spec(name, path=None)** **PathEntryFinder.find_spec(name)** Finders will return ModuleSpec objects when ``find_spec()`` is called. This new method replaces ``find_module()`` and ``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are used instead, for backward-compatibility. Adding yet another similar method to loaders is a case of practicality. ``find_module()`` could be changed to return specs instead of loaders. This is tempting because the import APIs have suffered enough, especially considering ``PathEntryFinder.find_loader()`` was just added in Python 3.3. However, the extra complexity and a less-than- explicit method name aren't worth it. Loaders ------- **Loader.exec_module(module)** Loaders will have a new method, exec_module(). Its only job is to "exec" the module and consequently populate the module's namespace. It is not responsible for creating or preparing the module object, nor for any cleanup afterward. It has no return value. exec_module() should properly handle the case where it is called more than once. For some kinds of modules this may mean raising ImportError every time after the first time the method is called. This is particularly relevant for reloading, where some kinds of modules do not support in-place reloading. **Loader.create_module(spec)** Loaders may also implement create_module() that will return a new module to exec. It may return None to indicate that the default module creation code should be used. One use case for create_module() is to provide a module that is a subclass of the builtin module type. Most loaders will not need to implement create_module(), create_module() should properly handle the case where it is called more than once for the same spec/module. This may include returning None or raising ImportError. Other changes: PEP 420 introduced the optional ``module_repr()`` loader method to limit the amount of special-casing in the module type's ``__repr__()``. Since this method is part of ``ModuleSpec``, it will be deprecated on loaders. However, if it exists on a loader it will be used exclusively. ``Loader.init_module_attr()`` method, added prior to Python 3.4's release , will be removed in favor of the same method on ``ModuleSpec``. However, ``InspectLoader.is_package()`` will not be deprecated even though the same information is found on ``ModuleSpec``. ``ModuleSpec`` can use it to populate its own ``is_package`` if that information is not otherwise available. Still, it will be made optional. One consequence of ModuleSpec is that loader ``__init__`` methods will no longer need to accommodate per-module state. The path-based loaders in ``importlib`` take arguments in their ``__init__()`` and have corresponding attributes. However, the need for those values is eliminated by module specs. In addition to executing a module during loading, loaders will still be directly responsible for providing APIs concerning module-related data. Other Changes ============= * The various finders and loaders provided by importlib will be updated to comply with this proposal. * The spec for the ``__main__`` module will reflect how the interpreter was started. For instance, with ``-m`` the spec's name will be that of the run module, while ``__main__.__name__`` will still be "__main__". * We add ``importlib.find_spec()`` to mirror ``importlib.find_loader()`` (which becomes deprecated). * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. * ``importlib.reload()`` will now make use of the per-module import lock. Reference Implementation ======================== A reference implementation will be available at http://bugs.python.org/issue18864. Open Issues ============== \* The impact of this change on pkgutil (and setuptools) needs looking into. It has some generic function-based extensions to PEP 302. These may break if importlib starts wrapping loaders without the tools' knowledge. \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, inspect. For instance, pickle should be updated in the __main__ case to look at ``module.__spec__.name``. \* Impact on some kinds of lazy loading modules. See [3]. \* Find a better name than loading_info? Perhaps loading_data, loader_state, or loader_info. \* Change loader.create_module() to prepare_module()? \* Add more explicit reloading support to exec_module() (and prepare_module())? References ========== [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html [2] https://mail.python.org/pipermail/import-sig/2013-September/000735.html [3] https://mail.python.org/pipermail/python-dev/2013-August/128129.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Sep 18 16:57:44 2013 From: brett at python.org (Brett Cannon) Date: Wed, 18 Sep 2013 10:57:44 -0400 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: Looking good! Comments inline. On Wed, Sep 18, 2013 at 5:51 AM, Eric Snow wrote: > Hi all, > > I finally got some time to update the PEP. I've simplified a few things, > most notably by making the 4 ModuleSpec methods (create, exec, load, > reload) "private". > > Also notable is that the new loader method is still create_module() and > there is still no flag for is_reload on either of the loader methods. I'm > still not clear on what the flag buys us and on why anything we'd do in a > prepare_module() we couldn't do in exec_module(). I'm trying to keep this > simple. :) > > Anyway, I still need to take some time to clean up the PEP formatting and > run a spell checker. I probably also missed some artifact of an older > version of the API. Otherwise I think it's in a good spot. Comments > welcome. > > -eric > > p.s. I also plan on getting the implementation up one of these days. :P > > =============================================================== > > PEP: 451 > Title: A ModuleSpec Type for the Import System > Version: $Revision$ > Last-Modified: $Date$ > Author: Eric Snow > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Aug-2013 > Python-Version: 3.4 > Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013 > Resolution: > > [SNIP] > Specification > ============= > > The goal is to address the gap between finders and loaders while > changing as little of their semantics as possible. Though some > functionality and information is moved to the new ``ModuleSpec`` type, > their behavior should remain the same. However, for the sake of clarity > the finder and loader semantics will be explicitly identified. > > This is a high-level summary of the changes described by this PEP. More > detail is available in later sections. > > importlib.machinery.ModuleSpec (new) > ------------------------------------ > > A specification for a module's import-system-related state. > > * ModuleSpec(name, loader, \*, origin=None, loading_info=None, > is_package=None) > > Attributes: > > * name - a string for the name of the module. > * loader - the loader to use for loading and for module data. > Just drop the "and for module data"; sentence is awkward with it and is a margin use-case. > * origin - a string for the location from which the module is loaded, > e.g. "builtin" for built-in modules and the filename for modules > loaded from source. > * submodule_search_locations - strings for where to find submodules, > if a package. > Very subtle hint that it's a sequence of of strings; might want to make it more explicit that it's a list. > * loading_info - a container of extra data for use during loading. > * cached (property) - a string for where the compiled module will be > stored (see PEP 3147). > * package (RO-property) - the name of the module's parent (or None). > * has_location (RO-property) - the module's origin refers to a location. > > Instance Methods: > > * module_repr() - provide a repr string for the spec'ed module. > * init_module_attrs(module) - set any of a module's import-related > attributes that aren't already set. > > importlib.util Additions > ------------------------ > > * spec_from_file_location(name, location, \*, loader=None, > submodule_search_locations=None) > - factory for file-based module specs. > * from_loader(name, loader, \*, origin=None, is_package=None) - factory > based on information provided by loaders. > * spec_from_module(module, loader=None) - factory based on existing > import-related module attributes. This function is expected to be > used only in some backward-compatibility situations. > > Other API Additions > ------------------- > > * importlib.abc.Loader.exec_module(module) will execute a module in its > own namespace. It replaces ``importlib.abc.Loader.load_module()``. > * importlib.abc.Loader.create_module(spec) (optional) will return a new > module to use for loading. > * Module objects will have a new attribute: ``__spec__``. > * importlib.find_spec(name, path=None) will return the spec for a > module. > > exec_module() and create_module() should not set any import-related > module attributes. The fact that load_module() does is a design flaw > that this proposal aims to correct. > This is a rather jarring place to make this statement since you're just outlining API additions, not design decisions. > > API Changes > ----------- > > * ``InspectLoader.is_package()`` will become optional. > > Deprecations > ------------ > > * importlib.abc.MetaPathFinder.find_module() > * importlib.abc.PathEntryFinder.find_module() > * importlib.abc.PathEntryFinder.find_loader() > * importlib.abc.Loader.load_module() > * importlib.abc.Loader.module_repr() > * The parameters and attributes of the various loaders in > importlib.machinery > * importlib.util.set_package() > * importlib.util.set_loader() > * importlib.find_loader() > Yay to all of this! =) > > Removals > -------- > > These were introduced prior to Python 3.4's release. > > * importlib.abc.Loader.init_module_attrs() > * importlib.util.module_to_load() > > Other Changes > ------------- > > * The import system implementation in importlib will be changed to make > use of ModuleSpec. > * Import-related module attributes (other than ``__spec__``) will no > longer be used directly by the import system. > * Import-related attributes should no longer be added to modules > directly. > * The module type's ``__repr__()`` will be thin wrapper around a pure > Python implementation which will leverage ModuleSpec. > "be a thin" > * The spec for the ``__main__`` module will reflect the appropriate > name and origin. > > Backward-Compatibility > ---------------------- > > * If a finder does not define find_spec(), a spec is derived from > the loader returned by find_module(). > * PathEntryFinder.find_loader() still takes priority over > find_module(). > * Loader.load_module() is used if exec_module() is not defined. > > What Will not Change? > --------------------- > > * The syntax and semantics of the import statement. > * Existing finders and loaders will continue to work normally. > * The import-related module attributes will still be initialized with > the same information. > * Finders will still create loaders (now storing them in specs). > * Loader.load_module(), if a module defines it, will have all the > same requirements and may still be called directly. > * Loaders will still be responsible for module data APIs. > * importlib.reload() will still overwrite the import-related attributes. > > > What Will Existing Finders and Loaders Have to Do Differently? > ============================================================== > > Immediately? Nothing. The status quo will be deprecated, but will > continue working. However, here are the things that the authors of > finders and loaders should change relative to this PEP: > > * Implement ``find_spec()`` on finders. > * Implement ``exec_module()`` on loaders, if possible. > > The ModuleSpec factory functions in importlib.util are intended to be > helpful for converting existing finders. ``from_loader()`` and > ``from_file_location()`` are both straight-forward utilities in this > regard. In the case where loaders already expose methods for creating > and preparing modules, ``ModuleSpec.from_module()`` may be useful to > the corresponding finder. > > For existing loaders, exec_module() should be a relatively direct > conversion from the non-boilerplate portion of load_module(). In some > uncommon cases the loader should also implement create_module(). > > > ModuleSpec Users > ================ > > ``ModuleSpec`` objects has 3 distinct target audiences: Python itself, > import hooks, and normal Python users. > "has" -> "have" > > Python will use specs in the import machinery, in interpreter startup, > and in various standard library modules. Some modules are > import-oriented, like pkgutil, and others are not, like pickle and > pydoc. In all cases, the full ``ModuleSpec`` API will get used. > > Import hooks (finders and loaders) will make use of the spec in specific > ways. First of all, finders may use the spec factory functions in > importlib.util to create spec objects. They may also directly adjust > the spec attributes after the spec is created. Secondly, the finder may > bind additional information to the spec (in finder_extras) for the > loader to consume during module creation/execution. Finally, loaders > will make use of the attributes on a spec when creating and/or executing > a module. > > Python users will be able to inspect a module's ``__spec__`` to get > import-related information about the object. Generally, Python > applications and interactive users will not be using the ``ModuleSpec`` > factory functions nor any the instance methods. > > > How Loading Will Work > ===================== > > This is an outline of what happens in ModuleSpec's loading > functionality:: > > def load(spec): > if not hasattr(spec.loader, 'exec_module'): > module = spec.loader.load_module(spec.name) > spec.init_module_attrs(module) > return sys.modules[spec.name] > > module = None > if hasattr(spec.loader, 'create_module'): > module = spec.loader.create_module(spec) > if module is None: > module = ModuleType(spec.name) > spec.init_module_attrs(module) > > spec._initializing = True > sys.modues[spec.name] = module > try: > spec.loader.exec_module(module) > except Exception: > del sys.modules[spec.name] > finally: > spec._initializing = False > return sys.modules[spec.name] > > These steps are exactly what ``Loader.load_module()`` is already > expected to do. Loaders will thus be simplified since they will only > need to implement exec_module(). > Two things. One, it's not exactly what loaders do as that _initializing is done by import itself. Any specific reason you added it here? Two, you forgot to re-raise the exception in the except clause. > > Note that we must return the module from sys.modules. During loading > the module may have replaced itself in sys.modules. Since we don't have > a post-import hook API to accommodate the use case, we have to deal with > it. However, in the replacement case we do not worry about setting the > import-related module attributes on the object. The module writer is on > their own if they are doing this. > > > ModuleSpec > ========== > > Attributes > ---------- > > Each of the following names is an attribute on ModuleSpec objects. A > value of ``None`` indicates "not set". This contrasts with module > objects where the attribute simply doesn't exist. Most of the > attributes correspond to the import-related attributes of modules. Here > is the mapping. The reverse of this mapping is used by > ModuleSpec.init_module_attrs(). > > ========================== ============== > On ModuleSpec On Modules > ========================== ============== > name __name__ > loader __loader__ > package __package__ > origin __file__* > cached __cached__*,** > submodule_search_locations __path__** > loading_info \- > has_location \- > ========================== ============== > > \* Set only if has_location is true. > \*\* Set only if the spec attribute is not None. > "Set on the module if the spec" > > While package and has_location are read-only properties, the remaining > attributes can be replaced after the module spec is created and even > after import is complete. This allows for unusual cases where directly > modifying the spec is the best option. However, typical use should not > involve changing the state of a module's spec. > > **origin** > > origin is a string for the place from which the module originates. > Aside from the informational value, it is also used in module_repr(). > > The module attribute ``__file__`` has a similar but more restricted > meaning. Not all modules have it set (e.g. built-in modules). However, > ``origin`` is applicable to all modules. For built-in modules it would > be set to "built-in". > > **has_location** > > Some modules can be loaded by reference to a location, e.g. a filesystem > path or a URL or something of the sort. Having the location lets you > load the module, but in theory you could load that module under various > names. > > In contrast, non-located modules can't be loaded in this fashion, e.g. > builtin modules and modules dynamically created in code. For these, the > name is the only way to access them, so they have an "origin" but not a > "location". > > This attribute reflects whether or not the module is locatable. If it > is, origin must be set to the module's location and ``__file__`` will be > set on the module. Not all locatable modules will be cachable, but most > will. > > The corresponding module attribute name, ``__file__``, is somewhat > inaccurate and potentially confusion, > "confusion" -> "confusing" > so we will use a more explicit > combination of origin and has_location to represent the same > information. Having a separate filename is unncessary since we have > origin. > Quote 'origin' so you don't read it like it should have been written "we have an origin". > > **submodule_search_locations** > > The list of location strings, typically directory paths, in which to > search for submodules. If the module is a package this will be set to > a list (even an empty one). Otherwise it is ``None``. > > The corresponding module attribute's name, ``__path__``, is relatively > ambiguous. Instead of mirroring it, we use a more explicit name that > makes the purpose clear. > > **loading_info** > > A finder may set loading_info to any value to provide additional > data for the loader to use during loading. A value of None is the > default and indicates that there is no additional data. Otherwise it > can be set to any object, such as a dict, list, or > types.SimpleNamespace, containing the relevant extra information. > > For example, zipimporter could use it to pass the zip archive name > to the loader directly, rather than needing to derive it from origin > or create a custom loader for each find operation. > > loading_info is meant for use by the finder and corresponding loader. > It is not guaranteed to be a stable resource for any other use. > > Omitted Attributes and Methods > ------------------------------ > > The following ModuleSpec methods are not part of the public API since > it is easy to use them incorrectly and only the import system really > needs them (i.e. they would be an attractive nuisance). > > * create() - provide a new module to use for loading. > * exec(module) - execute the spec into a module namespace. > * load() - prepare a module and execute it in a protected way. > * reload(module) - re-execute a module in a protected way. > If they are not part of the public API they should have a leading underscore. > > Here are other omissions: > > There is no PathModuleSpec subclass of ModuleSpec that separates out > has_location, cached, and submodule_search_locations. While that might > make the separation cleaner, module objects don't have that distinction. > ModuleSpec will support both cases equally well. > > While is_package would be a simple additional attribute (aliasing > ``self.submodule_search_locations is not None``), it perpetuates the > artificial (and mostly erroneous) distinction between modules and > packages. > > Conceivably, a ModuleSpec.load() method could optionally take a list of > modules with which to interact instead of sys.modules. That > capability is left out of this PEP, but may be pursued separately at > some other time, including relative to PEP 406 (import engine). > > Likewise load() could be leveraged to implement multi-version > imports. While interesting, doing so is outside the scope of this > proposal. > > Others: > > * Add ModuleSpec.submodules (RO-property) - returns possible submodules > relative to the spec. > * Add ModuleSpec.loaded (RO-property) - the module in sys.module, if > any. > * Add ModuleSpec.data - a descriptor that wraps the data API of the > spec's loader. > * Also see [3]. > > > Backward Compatibility > ---------------------- > > ModuleSpec doesn't have any. This would be a different story if > Finder.find_module() were to return a module spec instead of loader. > In that case, specs would have to act like the loader that would have > been returned instead. Doing so would be relatively simple, but is an > unnecessary complication. It was part of earlier versions of this PEP. > > Subclassing > ----------- > > Subclasses of ModuleSpec are allowed, but should not be necessary. > Simply setting loading_info or adding functionality to a custom > finder or loader will likely be a better fit and should be tried first. > However, as long as a subclass still fulfills the requirements of the > import system, objects of that type are completely fine as the return > value of Finder.find_spec(). > > > > [SNIP] > > > Open Issues > ============== > > \* The impact of this change on pkgutil (and setuptools) needs looking > into. It has some generic function-based extensions to PEP 302. These > may break if importlib starts wrapping loaders without the tools' > knowledge. > > \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, > inspect. > > For instance, pickle should be updated in the __main__ case to look at > ``module.__spec__.name``. > > \* Impact on some kinds of lazy loading modules. See [3]. > > \* Find a better name than loading_info? Perhaps loading_data, > loader_state, or loader_info. > loader_state or loader_data get my vote. > > \* Change loader.create_module() to prepare_module()? > -0 from me. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Sep 18 18:08:57 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 02:08:57 +1000 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On 18 September 2013 19:51, Eric Snow wrote: > Hi all, > > I finally got some time to update the PEP. I've simplified a few things, > most notably by making the 4 ModuleSpec methods (create, exec, load, reload) > "private". > > Also notable is that the new loader method is still create_module() and > there is still no flag for is_reload on either of the loader methods. I'm > still not clear on what the flag buys us and on why anything we'd do in a > prepare_module() we couldn't do in exec_module(). I'm trying to keep this > simple. :) The point is to give the invoker of the loader a chance to muck about with the module state before actually executing the module. For example, runpy and the updated extension loader API could use this to support execution of compiled Cython modules with -m. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Sep 19 00:14:13 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 16:14:13 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On Wed, Sep 18, 2013 at 10:08 AM, Nick Coghlan wrote: > On 18 September 2013 19:51, Eric Snow wrote: > > Hi all, > > > > I finally got some time to update the PEP. I've simplified a few things, > > most notably by making the 4 ModuleSpec methods (create, exec, load, > reload) > > "private". > > > > Also notable is that the new loader method is still create_module() and > > there is still no flag for is_reload on either of the loader methods. > I'm > > still not clear on what the flag buys us and on why anything we'd do in a > > prepare_module() we couldn't do in exec_module(). I'm trying to keep > this > > simple. :) > > The point is to give the invoker of the loader a chance to muck about > with the module state before actually executing the module. For > example, runpy and the updated extension loader API could use this to > support execution of compiled Cython modules with -m. > That makes sense. A loader.create_module() method (not called during reload) gives you that. I'm all for that. I'm just not clear on why it needs to be more than that. My understanding of the proposed prepare_module() is it would always be called right before exec_module(), whether it be load or reload (there would be no create_module()). Then in that case, can't loaders just roll their prepare_module() implementation into the beginning of exec_module() (even call spec.init_module_attrs() directly)? What's the advantage to splitting that out in the Loader API? I know I'm missing something here. (Maybe I shouldn't try to work on the PEP so late at night!) ...after further consideration... I expect it's so that during reload the loader can indicate "don't reload in-place, load into this module instead!" So the module passed in to exec_module() would end up being different from the existing module in sys.modules. However, can't exec_module() simply exec into the module that it would have returned from prepare_module() and then directly stick it into sys.modules? ...after further consideration... Okay, maybe I'm seeing it. Would it be something like the following? #-- start prepare_module() example -- class ModuleSpec: ... def _load(self): # This is basically the same as the PEP currently defines it. module = self.loader.prepare_module(self) # I prefer create_module for this. if module is None: module = ModuleType(self.name) self.init_module_attrs(module) # skipping some boilerplate sys.modules[self.name] = module self.loader.exec_module(module) return sys.modules[self.name] def _reload(self, module): # This is where it gets different. prepared = self.loader.prepare_module(self, module) if prepared is not None: self.init_module_attrs(prepared) module = prepared sys.modules[self.name] = module self.loader.exec_module(module) return sys.modules[self.name] class SomeLoader: def prepare_module(self, spec, module=None): if self.never_ever_been_loaded_before_not_even_in_subinterpreters( spec.name): self.initialize_stuff(spec) return MyCustomModule(spec.name) def exec_module(self, module): # Do exec stuff here. #-- end prepare_module() example -- (Note that _load() and _reload() could share more code than they do, but regardless...) Contrast that with what the PEP specifies currently. #-- start current PEP example -- class ModuleSpec: ... def _create(self): module = self.loader.create_module(self) if module is None: module = ModuleType(self.name) self.init_module_attrs(module) return module def _load(self): module = self._create() # skipping boilerplate self.loader.exec_module(module) return sys.modules[self.name] def _reload(self, module): self.loader.exec_module(module) return sys.modules[self.name] class SomeLoader: def create_module(self, spec): if self.never_ever_been_loaded_before_not_even_in_subinterpreters( spec.name): self.initialize_stuff(spec) return MyCustomModule(spec.name) def exec_module(self, module): if not self.never_ever_been_loaded_before_not_even_in_subinterpreters(spec.name): module = module.__spec__._create() # or module = self.create_module(spec); spec.init_module_attrs(module) sys.modules[module.__name__] = module # Do exec stuff here. #-- end current PEP example -- The way I see it, in the latter example the ModuleSpec is easier to follow, without making exec_module() that much more complicated. Regardless, at this point I'm seeing prepare_module() as a formal API for "use *this* module instead of what you would use by default." While create_module() provides that for the loading case, prepare_module() also provides it explicitly for the reloading case. Consequently, in the reload case prepare_module() does eliminate the boilerplate that exec_module() otherwise must accommodate. That's probably the biggest reason to go there. I wonder if we could instead wrap that bit in a ModuleSpec helper method that loaders can call in exec_module(): def _new_module_for_reload(self): module = self._create() sys.modules[self.name] = module FWIW, I think create_module() is still an appropriate (and better) name regardless of where it's used. At this point I still would rather stick with what the PEP currently specifies, but I'm going ruminate on the reload case--e,g, re-read your message about reload strategies as well as your response to my message about module lifecycles. I think I have a more context to fit them into the big picture here. Not to leave anything out, is there any reason we shouldn't punt right now on the whole reload mechanics issue and bundle it with the PEP on improving extension modules? I'd like to wrap up ModuleSpec and see about the .ref PEP that started all this. Plus I think this PEP is hitting the limit of a mentally bite-size proposal. I've been lamentably busy of late so I'm worried about expanding them PEP. However, I'm open to more discussion on supporting other reload strategies, particularly if you think this PEP should not move forward with having settled the issue. BTW, thanks for diving into the extension module questions (you and Stefan). Those discussions have helped improve this PEP. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 03:01:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 11:01:19 +1000 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: Yeah, I preferred the "prepare_module" name when I thought the extension loader returned the cached module object directly. It doesn't, it returns a copy, so "create_module" is fine. Also agreed on deferring reload behavioural improvements to a separate PEP. As noted in my other email, I think an advisory "this isn't going to work" API is a better idea now, since even pure Python modules don't always support reloading. And +1 to "loader_state" as the helper attribute name. Cheers, Nick. On 19 Sep 2013 08:14, "Eric Snow" wrote: > On Wed, Sep 18, 2013 at 10:08 AM, Nick Coghlan wrote: > >> On 18 September 2013 19:51, Eric Snow >> wrote: >> > Hi all, >> > >> > I finally got some time to update the PEP. I've simplified a few >> things, >> > most notably by making the 4 ModuleSpec methods (create, exec, load, >> reload) >> > "private". >> > >> > Also notable is that the new loader method is still create_module() and >> > there is still no flag for is_reload on either of the loader methods. >> I'm >> > still not clear on what the flag buys us and on why anything we'd do in >> a >> > prepare_module() we couldn't do in exec_module(). I'm trying to keep >> this >> > simple. :) >> >> The point is to give the invoker of the loader a chance to muck about >> with the module state before actually executing the module. For >> example, runpy and the updated extension loader API could use this to >> support execution of compiled Cython modules with -m. >> > > That makes sense. A loader.create_module() method (not called during > reload) gives you that. I'm all for that. I'm just not clear on why it > needs to be more than that. > > My understanding of the proposed prepare_module() is it would always be > called right before exec_module(), whether it be load or reload (there > would be no create_module()). Then in that case, can't loaders just roll > their prepare_module() implementation into the beginning of exec_module() > (even call spec.init_module_attrs() directly)? What's the advantage to > splitting that out in the Loader API? I know I'm missing something here. > (Maybe I shouldn't try to work on the PEP so late at night!) > > ...after further consideration... > > I expect it's so that during reload the loader can indicate "don't reload > in-place, load into this module instead!" So the module passed in to > exec_module() would end up being different from the existing module in > sys.modules. However, can't exec_module() simply exec into the module that > it would have returned from prepare_module() and then directly stick it > into sys.modules? > > ...after further consideration... > > Okay, maybe I'm seeing it. Would it be something like the following? > > #-- start prepare_module() example -- > > class ModuleSpec: > ... > def _load(self): > # This is basically the same as the PEP currently defines it. > module = self.loader.prepare_module(self) # I prefer > create_module for this. > if module is None: > module = ModuleType(self.name) > self.init_module_attrs(module) > # skipping some boilerplate > sys.modules[self.name] = module > self.loader.exec_module(module) > return sys.modules[self.name] > > def _reload(self, module): > # This is where it gets different. > prepared = self.loader.prepare_module(self, module) > if prepared is not None: > self.init_module_attrs(prepared) > module = prepared > sys.modules[self.name] = module > self.loader.exec_module(module) > return sys.modules[self.name] > > class SomeLoader: > > def prepare_module(self, spec, module=None): > if self.never_ever_been_loaded_before_not_even_in_subinterpreters( > spec.name): > self.initialize_stuff(spec) > return MyCustomModule(spec.name) > > def exec_module(self, module): > # Do exec stuff here. > > #-- end prepare_module() example -- > > (Note that _load() and _reload() could share more code than they do, but > regardless...) > > Contrast that with what the PEP specifies currently. > > #-- start current PEP example -- > > class ModuleSpec: > ... > def _create(self): > module = self.loader.create_module(self) > if module is None: > module = ModuleType(self.name) > self.init_module_attrs(module) > return module > > def _load(self): > module = self._create() > # skipping boilerplate > self.loader.exec_module(module) > return sys.modules[self.name] > > def _reload(self, module): > self.loader.exec_module(module) > return sys.modules[self.name] > > class SomeLoader: > > def create_module(self, spec): > if self.never_ever_been_loaded_before_not_even_in_subinterpreters( > spec.name): > self.initialize_stuff(spec) > return MyCustomModule(spec.name) > > def exec_module(self, module): > if not > self.never_ever_been_loaded_before_not_even_in_subinterpreters(spec.name): > module = module.__spec__._create() > # or module = self.create_module(spec); > spec.init_module_attrs(module) > sys.modules[module.__name__] = module > # Do exec stuff here. > > #-- end current PEP example -- > > The way I see it, in the latter example the ModuleSpec is easier to > follow, without making exec_module() that much more complicated. > > Regardless, at this point I'm seeing prepare_module() as a formal API for > "use *this* module instead of what you would use by default." While > create_module() provides that for the loading case, prepare_module() also > provides it explicitly for the reloading case. Consequently, in the reload > case prepare_module() does eliminate the boilerplate that exec_module() > otherwise must accommodate. That's probably the biggest reason to go there. > > I wonder if we could instead wrap that bit in a ModuleSpec helper method > that loaders can call in exec_module(): > > def _new_module_for_reload(self): > module = self._create() > sys.modules[self.name] = module > > FWIW, I think create_module() is still an appropriate (and better) name > regardless of where it's used. > > At this point I still would rather stick with what the PEP currently > specifies, but I'm going ruminate on the reload case--e,g, re-read your > message about reload strategies as well as your response to my message > about module lifecycles. I think I have a more context to fit them into > the big picture here. > > Not to leave anything out, is there any reason we shouldn't punt right now > on the whole reload mechanics issue and bundle it with the PEP on improving > extension modules? I'd like to wrap up ModuleSpec and see about the .ref > PEP that started all this. Plus I think this PEP is hitting the limit of a > mentally bite-size proposal. I've been lamentably busy of late so I'm > worried about expanding them PEP. However, I'm open to more discussion on > supporting other reload strategies, particularly if you think this PEP > should not move forward with having settled the issue. > > BTW, thanks for diving into the extension module questions (you and > Stefan). Those discussions have helped improve this PEP. :) > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 07:06:27 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 23:06:27 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On Wed, Sep 18, 2013 at 8:57 AM, Brett Cannon wrote: > Looking good! Comments inline. > Thanks for the feedback, Brett. I fixed everything you pointed out. Also, I'm going with loader_state. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 07:13:02 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 23:13:02 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On Wed, Sep 18, 2013 at 7:01 PM, Nick Coghlan wrote: > Yeah, I preferred the "prepare_module" name when I thought the extension > loader returned the cached module object directly. It doesn't, it returns a > copy, so "create_module" is fine. > Cool. > Also agreed on deferring reload behavioural improvements to a separate PEP. > Sounds good. > As noted in my other email, I think an advisory "this isn't going to work" > API is a better idea now, since even pure Python modules don't always > support reloading. > What do you mean by "advisory" API? > And +1 to "loader_state" as the helper attribute name. > That's settled then! Thanks for the feedback. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 07:34:53 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 23:34:53 -0600 Subject: [Import-SIG] Thoughts on cleaner reloading support In-Reply-To: References: Message-ID: On Mon, Sep 2, 2013 at 7:53 AM, Nick Coghlan wrote: > The extension module discussion on python-dev got me thinking about > the different ways in which the "singleton" assumption for modules can > be broken, and how to ensure that extension modules play nicely in > that environment. > > As I see it, there are 4 ways the "singleton that survives for the > lifetime of the process following initial import" assumption regarding > modules can turn out to be wrong: > > 1. In-place reload, overwriting the existing contents of a namespace. > imp.reload() does this. We sort of do it for __main__, except we > usually keep re-using that namespace to run *different* things, rather > than rerunning the same code. > > 2. Parallel loading. We remove the existing module from sys.modules > (keeping a reference to it alive), and load a second copy. > Alternatively, we call the loader APIs directly. Either way, we end up > with two independent copies of the "same" module, potentially > reflecting difference system states at the time of execution. > > 3. Subinterpreter support. Quite similar to parallel loading, but > we're loading the second copy because we're in a subinterpreter and > can't see the original. > > 4. Unloading. We remove the existing module from sys.modules and drop > all other references to it. The module gets destroyed, and we later > import a completely fresh copy. > > Even pure Python modules may not support these, since they may have > side effects, or assume they're in the main interpreter, or other > things. Currently, there is no way to signal this to the import > system, so we're left with implicit misbehaviour when we attempt to > reload the modules with global side effects. > This is a great summary. It got me thinking big (from which I usually pare my ideas down to something sane-ish ). > > For a while, I was thinking we could design the import system to "just > figure it out", but now I'm thinking a selection of read/write > properties on spec objects may make more sense: > > allow_reload > allow_unload > allow_reimport > allow_subinterpreter_import > > These would all default to True, but loaders and modules could > selectively turn them off. > 2 things: 1. These make more sense to me on the loader (though perhaps exposed on the spec). 2. These (and other related attributes) may be easier to digest if bundled into a LoaderCapabilities named tuple. For these attributes to be useful to the import system we would have to have several other module registries akin to sys.modules (or one registry that manages the extra info). Brainstorming on this I came up with an expanded list: allow_raw (basically create_module() called directly) allow_unregister (a.k.a. allow_unload) shared_globally (the module should be shared across all subinterpreters) allow_recreate (create again after unregister/unload) allow_create_parallel allow_exec_in_place allow_reexec (create again after unregister/unload) allow_exec_parallel For the sake of global (across subinterpreters) module registries, the same attributes could either be stored in a nested version of the named tuple or as more attributes, like this: allow_recreate_global allow_create_parallel_global allow_exec_in_place_global allow_reexec_global allow_exec_parallel_global I also realized that ImportEngine/ImportSystem (PEP 406) has a lot of the same trickiness as subinterpreters regarding reloading. > They would also be advisory Ah, here's where you were talking about "advisory" APIs. :) rather than enforced via all possible > import state manipulation mechanisms. New functions in importlib.util > could provide easier alternatives to directly manipulating > sys.modules: > > - importlib.util.reload (replacement for imp.reload that checks the > spec allows reloading) > - importlib.util.unload (replacement for "del > sys.modules[module.__name__]" that checks the spec allows unloading, > and also unloads all child modules) > - importlib.util.reimport (replacement for > test.support.import_fresh_module that checks the spec of any existing > sys.module entry allows reimporting a parallel copy) > At moments like this I keep thinking about PEP 406... :) > > One of these is not like the others... aside from the existing > extension module specific mechanism defined in PEP 3121, I'm not sure > we can devise a general *loader* level API to force imports for a > particular name to fail in a subinterpreter. So this concern probably > needs to be ignored in favour of a possible future C API level > solution. > Again, great write-up. I think you nailed it. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 07:38:23 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Sep 2013 23:38:23 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: I'm thinking that it may be useful to have ModuleSpec inherit from str and set it to the module name. Then the spec could be passed directly to those loader APIs that take the module name. Thoughts? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 10:14:34 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 18:14:34 +1000 Subject: [Import-SIG] Thoughts on cleaner reloading support In-Reply-To: References: Message-ID: On 19 September 2013 15:34, Eric Snow wrote: > On Mon, Sep 2, 2013 at 7:53 AM, Nick Coghlan wrote: >> For a while, I was thinking we could design the import system to "just >> figure it out", but now I'm thinking a selection of read/write >> properties on spec objects may make more sense: >> >> allow_reload >> allow_unload >> allow_reimport >> allow_subinterpreter_import >> >> These would all default to True, but loaders and modules could >> selectively turn them off. > > 2 things: > > 1. These make more sense to me on the loader (though perhaps exposed on the > spec). They can't go on the loader, since even pure Python modules can violate them. For example, if your Python module does the following unconditionally, then it no longer supports in-place reloading: import sys sys.modules[__name__] = obj So "allow" may be the wrong prefix. "handles" is probably better: handles_reload handles_unload handles_reimport handles_subinterpreter_import For backwards compatibility, these would all default to True, but loaders and modules would now have the ability to opt-out. This should probably go hand-in-hand with a new "post import hooks" PEP, since an "atimport" hook with the following signature might be useful to allow other modules to properly handle reloading of dependencies: @importlib.atimport("foo") def handle_foo_import(mod, reloaded): # If you need to do something when the module is destroyed: # weakref.finalize(mod, my_callback) Also, something neat about weakref.finalize, is that it means Python effectively supports module destructors! import weakref, sys mod = sys.modules[__name__] def del_this(): # implicit reference to the module globals from the function body weakref.finalize(mod, del_this) Modules already support weak references in 3.4, and weakref.finalize is new in 3.4 as well. > 2. These (and other related attributes) may be easier to digest if bundled > into a LoaderCapabilities named tuple. Aside from it needing to vary by module rather than by loader, I'd also be OK with: can_handle.reload can_handle.unload can_handle.reimport can_handle.subinterpreter_import > For these attributes to be useful to the import system we would have to have > several other module registries akin to sys.modules (or one registry that > manages the extra info). I was thinking it would be strictly warnings based, so you could still *try* these things, you'd just have to deal with the consequences. >> They would also be advisory > > > Ah, here's where you were talking about "advisory" APIs. :) > >> rather than enforced via all possible >> import state manipulation mechanisms. New functions in importlib.util >> could provide easier alternatives to directly manipulating >> sys.modules: >> >> - importlib.util.reload (replacement for imp.reload that checks the >> spec allows reloading) >> - importlib.util.unload (replacement for "del >> sys.modules[module.__name__]" that checks the spec allows unloading, >> and also unloads all child modules) >> - importlib.util.reimport (replacement for >> test.support.import_fresh_module that checks the spec of any existing >> sys.module entry allows reimporting a parallel copy) > > > At moments like this I keep thinking about PEP 406... :) Yup, that's definitely relevant. It would probably be good to sort out the thread-local context version of PEP 406 for 3.5 :) >> One of these is not like the others... aside from the existing >> extension module specific mechanism defined in PEP 3121, I'm not sure >> we can devise a general *loader* level API to force imports for a >> particular name to fail in a subinterpreter. So this concern probably >> needs to be ignored in favour of a possible future C API level >> solution. > > Again, great write-up. I think you nailed it. Yeah, the deep dive with Stefan into the extension loader implementation greatly clarified my thinking on a lot of things :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Sep 19 10:17:27 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Sep 2013 18:17:27 +1000 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On 19 September 2013 15:38, Eric Snow wrote: > I'm thinking that it may be useful to have ModuleSpec inherit from str and > set it to the module name. Then the spec could be passed directly to those > loader APIs that take the module name. Thoughts? I think I'd need to see the code you think it would simplify before saying yes (since my default answer is "No, inheriting from str is an unnecessary hack"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Thu Sep 19 10:21:22 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Sep 2013 10:21:22 +0200 Subject: [Import-SIG] PEP 451: Big update. References: Message-ID: <20130919102122.0fd1241d@pitrou.net> Le Wed, 18 Sep 2013 23:38:23 -0600, Eric Snow a ?crit : > I'm thinking that it may be useful to have ModuleSpec inherit from > str and set it to the module name. Then the spec could be passed > directly to those loader APIs that take the module name. Thoughts? I would generally be -1 on some hacks. Especially, str subclasses can leak to unsuspected places and create weird issues (I remember an issue with BeautifulSoup, IIRC, which returned str subclasses which kept whole HTML trees alive: by passing those str objects around you would create yourself a huge memory leak). Regards Antoine. From solipsis at pitrou.net Thu Sep 19 12:22:09 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Sep 2013 12:22:09 +0200 Subject: [Import-SIG] PEP 451: Big update. References: Message-ID: <20130919122209.09ce1811@pitrou.net> Hi, I have some questions and comments: > origin - a string for the location from which the module is loaded, > e.g. "builtin" for built-in modules and the filename for modules > loaded from source. Filename or filepath? What if the module is stored in e.g. a ZIP file? > submodule_search_locations - list of strings for where to find > submodules, if a package (None otherwise). Why isn't is_package exposed as an attribute too? > cached (property) - a string for where the compiled module will be > stored "where" is a filesystem location? (absolute? relative to the origin?) > has_location (RO-property) - the module's origin refers to a location. filesystem location? What about ZIP files? > spec_from_file_location(name, location, *, loader=None, > submodule_search_locations=None) - factory for file-based module specs What does it mean? Is it able to make "intelligent" decisions depending on e.g. whether the module is an extension module or a pure Python module? > from_loader(name, loader, *, origin=None, is_package=None) - factory > based on information provided by loaders. That description is rather unhelpful. > importlib.find_spec(name, path=None) will return the spec for a module. Is the module supposed to be already loaded or not? How is the spec "found"? Regards Antoine. From p.f.moore at gmail.com Thu Sep 19 13:28:24 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 19 Sep 2013 12:28:24 +0100 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: <20130919122209.09ce1811@pitrou.net> References: <20130919122209.09ce1811@pitrou.net> Message-ID: On 19 September 2013 11:22, Antoine Pitrou wrote: >> origin - a string for the location from which the module is loaded, >> e.g. "builtin" for built-in modules and the filename for modules >> loaded from source. > > Filename or filepath? What if the module is stored in e.g. a ZIP file? I haven't been following this thread closely, but this is a good point. There is a general issue that for modules loaded off sys.path, the module "location" needs to be somehow jammed into a string form (the absolute path for files, zip/file/path.zip/location/in/zipfile for zipfiles, but potentially anything at all for custom loaders) and for things loaded off sys.meta_path there's no need for any concept of path at all (that's how builtins, frozen modules et al work). It's worth being clear on both how this origin should be constructed in the general case (for the guidance of people implementing non-standard importers) and what users of the data can assume when using the data (can they split the value on os.sep or '/', for example, or is it in effect an opaque token). Some of the blame for all this being vague at the moment is down to me - when we were writing PEP 302, I wasn't brave enough to claim that path entries could be opaque token values, but I didn't want to insist that all importers had to follow a specific structure. So I ignored the issue and we just ended up with normal paths, and zipfiles which treat the zipfile as a pseudo-directory. And no examples of corner cases to keep people honest. My apologies for that... Paul From brett at python.org Thu Sep 19 16:11:52 2013 From: brett at python.org (Brett Cannon) Date: Thu, 19 Sep 2013 10:11:52 -0400 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: <20130919122209.09ce1811@pitrou.net> References: <20130919122209.09ce1811@pitrou.net> Message-ID: On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou wrote: > > Hi, > > I have some questions and comments: > > > origin - a string for the location from which the module is loaded, > > e.g. "builtin" for built-in modules and the filename for modules > > loaded from source. > > Filename or filepath? What if the module is stored in e.g. a ZIP file? > I think this would be what __file__ would be set to for zipfiles, so for zip files it would be e.g. /some/file.zip/path/to/module.py > > > submodule_search_locations - list of strings for where to find > > submodules, if a package (None otherwise). > > Why isn't is_package exposed as an attribute too? > It's redundant. The test for whether something is a package is literally ``submodule_search_locations is not None``. It just doesn't isn't complicated enough to warrant another attribute. Plus being a package isn't as important per-se as a concept as much as having a search path. > > > cached (property) - a string for where the compiled module will be > > stored > > "where" is a filesystem location? > (absolute? relative to the origin?) > It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would return. > > > has_location (RO-property) - the module's origin refers to a location. > > filesystem location? What about ZIP files? > It's a flag to basically say that origin contains what __file__ should be. -Brett > > > spec_from_file_location(name, location, *, loader=None, > > submodule_search_locations=None) - factory for file-based module specs > > What does it mean? Is it able to make "intelligent" decisions depending > on e.g. whether the module is an extension module or a pure Python > module? > > > from_loader(name, loader, *, origin=None, is_package=None) - factory > > based on information provided by loaders. > > That description is rather unhelpful. > > > importlib.find_spec(name, path=None) will return the spec for a module. > > Is the module supposed to be already loaded or not? How is the spec > "found"? > > Regards > > Antoine. > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 19 16:30:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Sep 2013 00:30:29 +1000 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: <20130919122209.09ce1811@pitrou.net> Message-ID: On 20 Sep 2013 00:12, "Brett Cannon" wrote: > > > > > On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou wrote: >> >> >> Hi, >> >> I have some questions and comments: >> >> > origin - a string for the location from which the module is loaded, >> > e.g. "builtin" for built-in modules and the filename for modules >> > loaded from source. >> >> Filename or filepath? What if the module is stored in e.g. a ZIP file? > > > I think this would be what __file__ would be set to for zipfiles, so for zip files it would be e.g. /some/file.zip/path/to/module.py > >> >> >> > submodule_search_locations - list of strings for where to find >> > submodules, if a package (None otherwise). >> >> Why isn't is_package exposed as an attribute too? > > > It's redundant. The test for whether something is a package is literally ``submodule_search_locations is not None``. It just doesn't isn't complicated enough to warrant another attribute. Plus being a package isn't as important per-se as a concept as much as having a search path. > >> >> >> > cached (property) - a string for where the compiled module will be >> > stored >> >> "where" is a filesystem location? >> (absolute? relative to the origin?) > > > It's what http://docs.python.org/3/library/imp.html#imp.cache_from_source would return. > >> >> >> > has_location (RO-property) - the module's origin refers to a location. >> >> filesystem location? What about ZIP files? > > > It's a flag to basically say that origin contains what __file__ should be. Thus indicating that get_data() on the loader can be used sensibly. Perhaps we could just make setting __file__ conditional on the loader defining get_data, rather than having it be a spec attribute? I also suggest that we adopt the convention of using angle brackets in non-location origins. So names like "" and "". To respond to something Paul said, our completely opaque token is "loader_state", origin is still intended to be a human readable string. Cheers, Nick. > > -Brett > >> >> >> > spec_from_file_location(name, location, *, loader=None, >> > submodule_search_locations=None) - factory for file-based module specs >> >> What does it mean? Is it able to make "intelligent" decisions depending >> on e.g. whether the module is an extension module or a pure Python >> module? >> >> > from_loader(name, loader, *, origin=None, is_package=None) - factory >> > based on information provided by loaders. >> >> That description is rather unhelpful. >> >> > importlib.find_spec(name, path=None) will return the spec for a module. >> >> Is the module supposed to be already loaded or not? How is the spec >> "found"? >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Import-SIG mailing list >> Import-SIG at python.org >> https://mail.python.org/mailman/listinfo/import-sig > > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 19 16:48:54 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Sep 2013 16:48:54 +0200 Subject: [Import-SIG] PEP 451: Big update. References: <20130919122209.09ce1811@pitrou.net> Message-ID: <20130919164854.5eba5b41@pitrou.net> Le Fri, 20 Sep 2013 00:30:29 +1000, Nick Coghlan a ?crit : > > I also suggest that we adopt the convention of using angle brackets in > non-location origins. So names like "" and "". +1. They stand out much better. Regards Antoine. From ericsnowcurrently at gmail.com Thu Sep 19 18:42:01 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Sep 2013 10:42:01 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: Message-ID: On Thu, Sep 19, 2013 at 2:17 AM, Nick Coghlan wrote: > On 19 September 2013 15:38, Eric Snow wrote: > > I'm thinking that it may be useful to have ModuleSpec inherit from str > and > > set it to the module name. Then the spec could be passed directly to > those > > loader APIs that take the module name. Thoughts? > > I think I'd need to see the code you think it would simplify before > saying yes (since my default answer is "No, inheriting from str is an > unnecessary hack"). > On Thu, Sep 19, 2013 at 2:21 AM, Antoine Pitrou wrote: > I would generally be -1 on some hacks. > Especially, str subclasses can leak to unsuspected places and create > weird issues (I remember an issue with BeautifulSoup, IIRC, which > returned str subclasses which kept whole HTML trees alive: by passing > those str objects around you would create yourself a huge memory leak). Agreed. I've done it in other projects for backward-compatibility reasons, but that doesn't really apply here. That's interesting about memory leaks. I would not have expected that. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 21:12:06 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Sep 2013 13:12:06 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: <20130919122209.09ce1811@pitrou.net> References: <20130919122209.09ce1811@pitrou.net> Message-ID: Hi Antoine, Thanks for the feedback. Comments inline. On Thu, Sep 19, 2013 at 4:22 AM, Antoine Pitrou wrote: > > origin - a string for the location from which the module is loaded, > > e.g. "builtin" for built-in modules and the filename for modules > > loaded from source. > > Filename or filepath? What if the module is stored in e.g. a ZIP file? > As Brett mentioned, it would be whatever is currently bound to __file__. Keep in mind that the two things I listed are just examples of the sorts of things that would go into "origin". The point of "origin" is actually explained in more detail further on in the PEP. > > > submodule_search_locations - list of strings for where to find > > submodules, if a package (None otherwise). > > Why isn't is_package exposed as an attribute too? > We had some discussion on this on a previous revision of the PEP. Initially I had is_package as a property of ModuleSpec. However, we came to the agreement that whether or not the spec represents a package is not very important once you have the spec. This contrasts with the is_package parameter to ModuleSpec which is useful since it represents a set of things that should be effected on the new spec object. Ultimately Nick put it best when he said that we need to de-emphasize the superficial package/module distinction, not enshrine it as an attribute. The PEP actually addresses the question of is_package in the "Omitted Attributes and Methods" section. > > cached (property) - a string for where the compiled module will be > > stored > > "where" is a filesystem location? > (absolute? relative to the origin?) > As Brett noted (and the module attribute table further on indicates), this is the same as the __cache__ attribute of modules. > > has_location (RO-property) - the module's origin refers to a location. > > filesystem location? What about ZIP files? > Also as Brett indicated, this is a flag that indicates that "origin" should be copied into __file__ on corresponding module objects. However, the summary is pretty unclear. I'll fix that. > > > spec_from_file_location(name, location, *, loader=None, > > submodule_search_locations=None) - factory for file-based module specs > > What does it mean? Is it able to make "intelligent" decisions depending > on e.g. whether the module is an extension module or a pure Python > module? > It does make some intelligent decisions. Otherwise a finder would just call ModuleSpec directly. (All three factory functions are there for the convenience of finders.) I'll add some explanation on what those decisions entail and also clarify the summary. > > > from_loader(name, loader, *, origin=None, is_package=None) - factory > > based on information provided by loaders. > > That description is rather unhelpful. > Likewise I'll add more explanation for this as well as improve the summary. > > importlib.find_spec(name, path=None) will return the spec for a module. > > Is the module supposed to be already loaded or not? How is the spec > "found"? > This function is the replacement for importlib.find_loader(). Instead of returning a loader it returns a spec. Otherwise it's the same. I'll make the summary more clear. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 21:30:18 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Sep 2013 13:30:18 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: <20130919122209.09ce1811@pitrou.net> Message-ID: On Thu, Sep 19, 2013 at 5:28 AM, Paul Moore wrote: > On 19 September 2013 11:22, Antoine Pitrou wrote: > >> origin - a string for the location from which the module is loaded, > >> e.g. "builtin" for built-in modules and the filename for modules > >> loaded from source. > > > > Filename or filepath? What if the module is stored in e.g. a ZIP file? > > I haven't been following this thread closely, but this is a good > point. There is a general issue that for modules loaded off sys.path, > the module "location" needs to be somehow jammed into a string form > (the absolute path for files, zip/file/path.zip/location/in/zipfile > for zipfiles, but potentially anything at all for custom loaders) and > for things loaded off sys.meta_path there's no need for any concept of > path at all (that's how builtins, frozen modules et al work). > > It's worth being clear on both how this origin should be constructed > in the general case (for the guidance of people implementing > non-standard importers) and what users of the data can assume when > using the data (can they split the value on os.sep or '/', for > example, or is it in effect an opaque token). > Actually, "origin" is meant to be pretty unconstrained string. It only has 2 explicit purposes: use in spec.module_repr() and as the value of __file__ when spec.has_location is true. The loader may use "origin" however it likes. Presumably the finder would populate origin in whatever format the loader needs (if the loader even needs "origin"), but that's between the finder and loader. If the loader needs even more info, the finder can just stick it into the spec's loader_state attribute. > > Some of the blame for all this being vague at the moment is down to me > - when we were writing PEP 302, I wasn't brave enough to claim that > path entries could be opaque token values, but I didn't want to insist > that all importers had to follow a specific structure. So I ignored > the issue and we just ended up with normal paths, and zipfiles which > treat the zipfile as a pseudo-directory. And no examples of corner > cases to keep people honest. My apologies for that... > As Nick pointed out, the "loader_state" attribute of ModuleSpec objects is meant to be the container for any extra data the loader needs. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 21:42:57 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Sep 2013 13:42:57 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: <20130919122209.09ce1811@pitrou.net> Message-ID: On Thu, Sep 19, 2013 at 8:30 AM, Nick Coghlan wrote: > On 20 Sep 2013 00:12, "Brett Cannon" wrote: > > On Thu, Sep 19, 2013 at 6:22 AM, Antoine Pitrou > wrote: > >> > has_location (RO-property) - the module's origin refers to a location. > >> > >> filesystem location? What about ZIP files? > > > > > > It's a flag to basically say that origin contains what __file__ should > be. > > Thus indicating that get_data() on the loader can be used sensibly. > Perhaps we could just make setting __file__ conditional on the loader > defining get_data, rather than having it be a spec attribute? > I'd still like to keep an explicit "has_location" as a clear, informational declaration. How about we always set it to True if loader.get_data exists? I think you proposed this before and it got lost in the shuffle. > I also suggest that we adopt the convention of using angle brackets in > non-location origins. So names like "" and "". > Well, I'm already having module_repr() do that. I've thought of this before, but decided it was better to have the separate "has_location" attribute. Then there is no ambiguity between the origin of a non-locatable module and a locatable one that happens to have bookend angle brackets. I will make sure the spec is explicit about the angle brackets in module_repr(). -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 19 21:52:05 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Sep 2013 13:52:05 -0600 Subject: [Import-SIG] PEP 451: Big update. In-Reply-To: References: <20130919122209.09ce1811@pitrou.net> Message-ID: On Thu, Sep 19, 2013 at 1:42 PM, Eric Snow wrote: > On Thu, Sep 19, 2013 at 8:30 AM, Nick Coghlan wrote: > >> I also suggest that we adopt the convention of using angle brackets in >> non-location origins. So names like "" and "". >> > Well, I'm already having module_repr() do that. > Actually no I wasn't. The current repr for the sys module is "". Adding the angle brackets would change that. It's not a big deal to me either way. I actually kind of like the idea of using angle brackets (by convention) on a non-locatable origin. It just changes existing reprs and can be ambiguous in the (unlikely) situation I described. I'm leaning toward not doing the angle brackets, but I can be swayed. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Sep 25 07:46:45 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 24 Sep 2013 23:46:45 -0600 Subject: [Import-SIG] latest update of PEP 451 Message-ID: I've updated PEP 451 to address comments and clear a few things up. Most notably, I added a list of terms at the beginning. The PEP is pretty close to done and feedback has simmered down. Does anyone object to my posting the next update to python-dev? There are two main open questions: 1. How does ModuleSpec interact with existing import-sensitive modules in the standard library? 2. PJE's concerns about reload semantics and lazy loading. Regarding the first, I'm not too concerned with the ability to adapt those modules to ModuleSpec without much effort. However, I will be doing a thorough check before I'll ask for pronouncement. About lazy loading, from what I understand, importlib.reload() broke backward compatibility with regards to PJE's use case when it switched to depending on __loader__. Perhaps it was even before that. I'll have to check. Regardless, PEP 451 does not change the semantic of reload() from what they currently are. The PEP could restore the previous semantics without a lot of work (if module.__spec__ is not set, then call find_spec(), set it, and reload using that). However, if reload() backward compatibility got broken somewhere along the lines, that sounds like a bug that should be addressed separately. -eric ===================================================================== PEP: 451 Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013 Resolution: Abstract ======== This PEP proposes to add a new class to importlib.machinery called "ModuleSpec". It will provide all the import-related information used to load a module and will be available without needing to load the module first. Finders will directly provide a module's spec instead of a loader (which they will continue to provide indirectly). The import machinery will be adjusted to take advantage of module specs, including using them to load modules. Terms and Concepts ================== The changes in this proposal are an opportunity to make several existing terms and concepts more clear, whereas currently they are (unfortunately) ambiguous. New concepts are also introduced in this proposal. Finally, it's worth explaining a few other existing terms with which people may not be so familiar. For the sake of context, here is a brief summary of all three groups of terms and concepts. A more detailed explanation of the import system is found at [import_system_docs]_. finder ------ A "finder" is an object that identifies the loader that the import system should use to load a module. Currently this is accomplished by calling the finder's find_module() method, which returns the loader. Finders are strictly responsible for providing the loader, which they do through their find_module() method. The import system then uses that loader to load the module. loader ------ A "loader" is an object that is used to load a module during import. Currently this is done by calling the loader's load_module() method. A loader may also provide APIs for getting information about the modules it can load, as well as about data from sources associated with such a module. Right now loaders (via load_module()) are responsible for certain boilerplate import-related operations. These are: 1. perform some (module-related) validation; 2. create the module object; 3. set import-related attributes on the module; 4. "register" the module to sys.modules; 5. exec the module; 6. clean up in the event of failure while loading the module. This all takes place during the import system's call to Loader.load_module(). origin ------ This is a new term and concept. The idea of it exists subtly in the import system already, but this proposal makes the concept explicit. "origin" is the import context means the system (or resource within a system) from which a module originates. For the purposes of this proposal, "origin" is also a string which identifies such a resource or system. "origin" is applicable to all modules. For example, the origin for built-in and frozen modules is the interpreter itself. The import system already identifies this origin as "built-in" and "frozen", respectively. This is demonstrated in the following module repr: "". In fact, the module repr is already a relatively reliable, though implicit, indicator of a module's origin. Other modules also indicate their origin through other means, as described in the entry for "location". It is up to the loader to decide on how to interpret and use a module's origin, if at all. location -------- This is a new term. However the concept already exists clearly in the import system, as associated with the ``__file__`` and ``__path__`` attributes of modules, as well as the name/term "path" elsewhere. A "location" is a resource or "place", rather than a system at large, from which a module is loaded. It qualifies as an "origin". Examples of locations include filesystem paths and URLs. A location is identified by the name of the resource, but may not necessarily identify the system to which the resource pertains. In such cases the loader would have to identify the system itself. In contrast to other kinds of module origin, a location cannot be inferred by the loader just by the module name. Instead, the loader must be provided with a string to identify the location, usually by the finder that generates the loader. The loader then uses this information to locate the resource from which it will load the module. In theory you could load the module at a given location under various names. The most common example of locations in the import system are the files from which source and extension modules are loaded. For these modules the location is identified by the string in the ``__file__`` attribute. Although ``__file__`` isn't particularly accurate for some modules (e.g. zipped), it is currently the only way that the import system indicates that a module has a location. A module that has a location may be called "locatable". cache ----- The import system stores compiled modules in the __pycache__ directory as an optimization. This module cache that we use today was provided by PEP 3147. For this proposal, the relevant API for module caching is the ``__cache__`` attribute of modules and the cache_from_source() function in importlib.util. Loaders are responsible for putting modules into the cache (and loading out of the cache). Currently the cache is only used for compiled source modules. However, this proposal explicitly allows package ------- The concept does not change, nor does the term. However, the distinction between modules and packages is mostly superficial. Packages *are* modules. They simply have a ``__path__`` attribute and import may add attributes bound to submodules. The typical perceived difference is a source of confusion. This proposal explicitly de-emphasizes the distinction between packages and modules where it makes sense to do so. Motivation ========== The import system has evolved over the lifetime of Python. In late 2002 PEP 302 introduced standardized import hooks via finders and loaders and sys.meta_path. The importlib module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibilty also presents a challenge. As more developers come to understand and customize the import system, any weaknesses in the finder and loader APIs will be more impactful. So the sooner we can address any such weaknesses the import system, the better...and there are a couple we can take care of with this proposal. Firstly, any time the import system needs to save information about a module we end up with more attributes on module objects that are generally only meaningful to the import system. It would be nice to have a per-module namespace in which to put future import-related information and to pass around within the import system. Secondly, there's an API void between finders and loaders that causes undue complexity when encountered. The PEP 420 (namespace packages) implementation had to work around this. The complexity surfaced again during recent efforts on a separate proposal. [ref_files_pep]_ The `finder`_ and `loader`_ sections above detail current responsibility of both. Notably, loaders are not required to provide any of the functionality of their load_module() through other methods. Thus, though the import-related information about a module is likely available without loading the module, it is not otherwise exposed. Furthermore, the requirements assocated with load_module() are common to all loaders and mostly are implemented in exactly the same way. This means every loader has to duplicate the same boilerplate code. importlib.util provides some tools that help with this, but it would be more helpful if the import system simply took charge of these responsibilities. The trouble is that this would limit the degree of customization that load_module() could easily continue to facilitate. More importantly, While a finder *could* provide the information that the loader's load_module() would need, it currently has no consistent way to get it to the loader. This is a gap between finders and loaders which this proposal aims to fill. Finally, when the import system calls a finder's find_module(), the finder makes use of a variety of information about the module that is useful outside the context of the method. Currently the options are limited for persisting that per-module information past the method call, since it only returns the loader. Popular options for this limitation are to store the information in a module-to-info mapping somewhere on the finder itself, or store it on the loader. Unfortunately, loaders are not required to be module-specific. On top of that, some of the useful information finders could provide is common to all finders, so ideally the import system could take care of those details. This is the same gap as before between finders and loaders. As an example of complexity attributable to this flaw, the implementation of namespace packages in Python 3.3 (see PEP 420) added FileFinder.find_loader() because there was no good way for find_module() to provide the namespace search locations. The answer to this gap is a ModuleSpec object that contains the per-module information and takes care of the boilerplate functionality involved with loading the module. Specification ============= The goal is to address the gap between finders and loaders while changing as little of their semantics as possible. Though some functionality and information is moved to the new ModuleSpec type, their behavior should remain the same. However, for the sake of clarity the finder and loader semantics will be explicitly identified. Here is a high-level summary of the changes described by this PEP. More detail is available in later sections. importlib.machinery.ModuleSpec (new) ------------------------------------ A specification for a module's import-system-related state. See the `ModuleSpec`_ section below for a more detailed description. * ModuleSpec(name, loader, \*, origin=None, loader_state=None, is_package=None) Attributes: * name - a string for the name of the module. * loader - the loader to use for loading. * origin - the name of the place from which the module is loaded, e.g. "builtin" for built-in modules and the filename for modules loaded from source. * submodule_search_locations - list of strings for where to find submodules, if a package (None otherwise). * loader_state - a container of extra module-specific data for use during loading. * cached (property) - a string for where the compiled module should be stored. * parent (RO-property) - the name of the package to which the module belongs as a submodule (or None). * has_location (RO-property) - a flag indicating whether or not the module's "origin" attribute refers to a location. Instance Methods: * module_repr() - provide a repr string for the spec'ed module; non-locatable modules will use their origin (e.g. "built-in"). * init_module_attrs(module) - set any of a module's import-related attributes that aren't already set. importlib.util Additions ------------------------ These are ModuleSpec factory functions, meant as a convenience for finders. See the `Factory Functions`_ section below for more detail. * spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None) - build a spec from file-oriented information and loader APIs. * from_loader(name, loader, \*, origin=None, is_package=None) - build a spec with missing information filled in by using loader APIs. This factory function is useful for some backward-compatibility situations: * spec_from_module(module, loader=None) - build a spec based on the import-related attributes of an existing module. Other API Additions ------------------- * importlib.find_spec(name, path=None) will work exactly the same as importlib.find_loader() (which it replaces), but return a spec instead of a loader. For loaders: * importlib.abc.Loader.exec_module(module) will execute a module in its own namespace. It replaces importlib.abc.Loader.load_module(), taking over its module execution functionality. * importlib.abc.Loader.create_module(spec) (optional) will return the module to use for loading. For modules: * Module objects will have a new attribute: ``__spec__``. API Changes ----------- * InspectLoader.is_package() will become optional. Deprecations ------------ * importlib.abc.MetaPathFinder.find_module() * importlib.abc.PathEntryFinder.find_module() * importlib.abc.PathEntryFinder.find_loader() * importlib.abc.Loader.load_module() * importlib.abc.Loader.module_repr() * The parameters and attributes of the various loaders in importlib.machinery * importlib.util.set_package() * importlib.util.set_loader() * importlib.find_loader() Removals -------- These were introduced prior to Python 3.4's release, so they can simply be removed. * importlib.abc.Loader.init_module_attrs() * importlib.util.module_to_load() Other Changes ------------- * The import system implementation in importlib will be changed to make use of ModuleSpec. * importlib.reload() will make use of ModuleSpec. * Import-related module attributes (other than ``__spec__``) will no longer be used directly by the import system. * Import-related attributes should no longer be added to modules directly. * The module type's ``__repr__()`` will be a thin wrapper around a pure Python implementation which will leverage ModuleSpec. * The spec for the ``__main__`` module will reflect the appropriate name and origin. Backward-Compatibility ---------------------- * If a finder does not define find_spec(), a spec is derived from the loader returned by find_module(). * PathEntryFinder.find_loader() still takes priority over find_module(). * Loader.load_module() is used if exec_module() is not defined. What Will not Change? --------------------- * The syntax and semantics of the import statement. * Existing finders and loaders will continue to work normally. * The import-related module attributes will still be initialized with the same information. * Finders will still create loaders (now storing them in specs). * Loader.load_module(), if a module defines it, will have all the same requirements and may still be called directly. * Loaders will still be responsible for module data APIs. * importlib.reload() will still overwrite the import-related attributes. Responsibilities ---------------- Here's a quick breakdown of where responsibilities lie after this PEP. finders: * create loader * create spec loaders: * create module (optional) * execute module ModuleSpec: * orchestrate module loading * boilerplate for module loading, including managing sys.modules and setting import-related attributes * create module if loader doesn't * call loader.exec_module(), passing in the module in which to exec * contain all the information the loader needs to exec the module * provide the repr for modules What Will Existing Finders and Loaders Have to Do Differently? ============================================================== Immediately? Nothing. The status quo will be deprecated, but will continue working. However, here are the things that the authors of finders and loaders should change relative to this PEP: * Implement find_spec() on finders. * Implement exec_module() on loaders, if possible. The ModuleSpec factory functions in importlib.util are intended to be helpful for converting existing finders. from_loader() and from_file_location() are both straight-forward utilities in this regard. In the case where loaders already expose methods for creating and preparing modules, ModuleSpec.from_module() may be useful to the corresponding finder. For existing loaders, exec_module() should be a relatively direct conversion from the non-boilerplate portion of load_module(). In some uncommon cases the loader should also implement create_module(). ModuleSpec Users ================ ModuleSpec objects have 3 distinct target audiences: Python itself, import hooks, and normal Python users. Python will use specs in the import machinery, in interpreter startup, and in various standard library modules. Some modules are import-oriented, like pkgutil, and others are not, like pickle and pydoc. In all cases, the full ModuleSpec API will get used. Import hooks (finders and loaders) will make use of the spec in specific ways. First of all, finders may use the spec factory functions in importlib.util to create spec objects. They may also directly adjust the spec attributes after the spec is created. Secondly, the finder may bind additional information to the spec (in finder_extras) for the loader to consume during module creation/execution. Finally, loaders will make use of the attributes on a spec when creating and/or executing a module. Python users will be able to inspect a module's ``__spec__`` to get import-related information about the object. Generally, Python applications and interactive users will not be using the ``ModuleSpec`` factory functions nor any the instance methods. How Loading Will Work ===================== This is an outline of what happens in ModuleSpec's loading functionality:: def load(spec): if not hasattr(spec.loader, 'exec_module'): module = spec.loader.load_module(spec.name) spec.init_module_attrs(module) return sys.modules[spec.name] module = None if hasattr(spec.loader, 'create_module'): module = spec.loader.create_module(spec) if module is None: module = ModuleType(spec.name) spec.init_module_attrs(module) sys.modues[spec.name] = module try: spec.loader.exec_module(module) except Exception: del sys.modules[spec.name] raise return sys.modules[spec.name] These steps are exactly what Loader.load_module() is already expected to do. Loaders will thus be simplified since they will only need to implement exec_module(). Note that we must return the module from sys.modules. During loading the module may have replaced itself in sys.modules. Since we don't have a post-import hook API to accommodate the use case, we have to deal with it. However, in the replacement case we do not worry about setting the import-related module attributes on the object. The module writer is on their own if they are doing this. ModuleSpec ========== Attributes ---------- Each of the following names is an attribute on ModuleSpec objects. A value of None indicates "not set". This contrasts with module objects where the attribute simply doesn't exist. Most of the attributes correspond to the import-related attributes of modules. Here is the mapping. The reverse of this mapping is used by ModuleSpec.init_module_attrs(). ========================== ============== On ModuleSpec On Modules ========================== ============== name __name__ loader __loader__ package __package__ origin __file__* cached __cached__*,** submodule_search_locations __path__** loader_state \- has_location \- ========================== ============== | \* Set on the module only if spec.has_location is true. | \*\* Set on the module only if the spec attribute is not None. While package and has_location are read-only properties, the remaining attributes can be replaced after the module spec is created and even after import is complete. This allows for unusual cases where directly modifying the spec is the best option. However, typical use should not involve changing the state of a module's spec. **origin** "origin" is a string for the name of the place from which the module originates. See `origin`_ above. Aside from the informational value, it is also used in module_repr(). In the case of a spec where "has_location" is true, ``__file__`` is set to the value of "origin". For built-in modules "origin" would be set to "built-in". **has_location** As explained in the `location`_ section above, many modules are "locatable", meaning there is a corresponding resource from which the module will be loaded and that resource can be described by a string. In contrast, non-locatable modules can't be loaded in this fashion, e.g. builtin modules and modules dynamically created in code. For these, the name is the only way to access them, so they have an "origin" but not a "location". "has_location" is true if the module is locatable. In that case the spec's origin is used as the location and ``__file__`` is set to spec.origin. If additional location information is required (e.g. zipimport), that information may be stored in spec.loader_state. "has_location" may be implied from the existence of a load_data() method on the loader. Incidently, not all locatable modules will be cachable, but most will. **submodule_search_locations** The list of location strings, typically directory paths, in which to search for submodules. If the module is a package this will be set to a list (even an empty one). Otherwise it is None. The name of the corresponding module attribute, ``__path__``, is relatively ambiguous. Instead of mirroring it, we use a more explicit name that makes the purpose clear. **loader_state** A finder may set loader_state to any value to provide additional data for the loader to use during loading. A value of None is the default and indicates that there is no additional data. Otherwise it can be set to any object, such as a dict, list, or types.SimpleNamespace, containing the relevant extra information. For example, zipimporter could use it to pass the zip archive name to the loader directly, rather than needing to derive it from origin or create a custom loader for each find operation. loader_state is meant for use by the finder and corresponding loader. It is not guaranteed to be a stable resource for any other use. Factory Functions ----------------- **spec_from_file_location(name, location, \*, loader=None, submodule_search_locations=None)** Build a spec from file-oriented information and loader APIs. * "origin" will be set to the location. * "has_location" will be set to True. * "cached" will be set to the result of calling cache_from_source(). * "origin" can be deduced from loader.get_filename() (if "location" is not passed in. * "loader" can be deduced from suffix if the location is a filename. * "submodule_search_locations" can be deduced from loader.is_package() and from os.path.dirname(location) if locatin is a filename. **from_loader(name, loader, \*, origin=None, is_package=None)** Build a spec with missing information filled in by using loader APIs. * "has_location" can be deduced from loader.get_data. * "origin" can be deduced from loader.get_filename(). * "submodule_search_locations" can be deduced from loader.is_package() and from os.path.dirname(location) if locatin is a filename. **spec_from_module(module, loader=None)** Build a spec based on the import-related attributes of an existing module. The spec attributes are set to the corresponding import- related module attributes. See the table in `Attributes`_. Omitted Attributes and Methods ------------------------------ The following ModuleSpec methods are not part of the public API since it is easy to use them incorrectly and only the import system really needs them (i.e. they would be an attractive nuisance). * _create() - provide a new module to use for loading. * _exec(module) - execute the spec into a module namespace. * _load() - prepare a module and execute it in a protected way. * _reload(module) - re-execute a module in a protected way. Here are other omissions: There is no "PathModuleSpec" subclass of ModuleSpec that separates out has_location, cached, and submodule_search_locations. While that might make the separation cleaner, module objects don't have that distinction. ModuleSpec will support both cases equally well. While "is_package" would be a simple additional attribute (aliasing self.submodule_search_locations is not None), it perpetuates the artificial (and mostly erroneous) distinction between modules and packages. Conceivably, a ModuleSpec.load() method could optionally take a list of modules with which to interact instead of sys.modules. That capability is left out of this PEP, but may be pursued separately at some other time, including relative to PEP 406 (import engine). Likewise load() could be leveraged to implement multi-version imports. While interesting, doing so is outside the scope of this proposal. Others: * Add ModuleSpec.submodules (RO-property) - returns possible submodules relative to the spec. * Add ModuleSpec.loaded (RO-property) - the module in sys.module, if any. * Add ModuleSpec.data - a descriptor that wraps the data API of the spec's loader. * Also see [cleaner_reload_support]_. Backward Compatibility ---------------------- ModuleSpec doesn't have any. This would be a different story if Finder.find_module() were to return a module spec instead of loader. In that case, specs would have to act like the loader that would have been returned instead. Doing so would be relatively simple, but is an unnecessary complication. It was part of earlier versions of this PEP. Subclassing ----------- Subclasses of ModuleSpec are allowed, but should not be necessary. Simply setting loader_state or adding functionality to a custom finder or loader will likely be a better fit and should be tried first. However, as long as a subclass still fulfills the requirements of the import system, objects of that type are completely fine as the return value of Finder.find_spec(). Existing Types ============== Module Objects -------------- Other than adding ``__spec__``, none of the import-related module attributes will be changed or deprecated, though some of them could be; any such deprecation can wait until Python 4. A module's spec will not be kept in sync with the corresponding import- related attributes. Though they may differ, in practice they will typically be the same. One notable exception is that case where a module is run as a script by using the ``-m`` flag. In that case ``module.__spec__.name`` will reflect the actual module name while ``module.__name__`` will be ``__main__``. A module's spec is not guaranteed to be identical between two modules with the same name. Likewise there is no guarantee that successive calls to importlib.find_spec() will return the same object or even an equivalent object, though at least the latter is likely. Finders ------- Finders are still responsible for creating the loader. That loader will now be stored in the module spec returned by find_spec() rather than returned directly. As is currently the case without the PEP, if a loader would be costly to create, that loader can be designed to defer the cost until later. **MetaPathFinder.find_spec(name, path=None)** **PathEntryFinder.find_spec(name)** Finders will return ModuleSpec objects when find_spec() is called. This new method replaces find_module() and find_loader() (in the PathEntryFinder case). If a loader does not have find_spec(), find_module() and find_loader() are used instead, for backward-compatibility. Adding yet another similar method to loaders is a case of practicality. find_module() could be changed to return specs instead of loaders. This is tempting because the import APIs have suffered enough, especially considering PathEntryFinder.find_loader() was just added in Python 3.3. However, the extra complexity and a less-than- explicit method name aren't worth it. Loaders ------- **Loader.exec_module(module)** Loaders will have a new method, exec_module(). Its only job is to "exec" the module and consequently populate the module's namespace. It is not responsible for creating or preparing the module object, nor for any cleanup afterward. It has no return value. exec_module() will be used during both loading and reloading. exec_module() should properly handle the case where it is called more than once. For some kinds of modules this may mean raising ImportError every time after the first time the method is called. This is particularly relevant for reloading, where some kinds of modules do not support in-place reloading. **Loader.create_module(spec)** Loaders may also implement create_module() that will return a new module to exec. It may return None to indicate that the default module creation code should be used. One use case, though atypical, for create_module() is to provide a module that is a subclass of the builtin module type. Most loaders will not need to implement create_module(), create_module() should properly handle the case where it is called more than once for the same spec/module. This may include returning None or raising ImportError. .. note:: exec_module() and create_module() should not set any import-related module attributes. The fact that load_module() does is a design flaw that this proposal aims to correct. Other changes: PEP 420 introduced the optional module_repr() loader method to limit the amount of special-casing in the module type's ``__repr__()``. Since this method is part of ModuleSpec, it will be deprecated on loaders. However, if it exists on a loader it will be used exclusively. Loader.init_module_attr() method, added prior to Python 3.4's release , will be removed in favor of the same method on ModuleSpec. However, InspectLoader.is_package() will not be deprecated even though the same information is found on ModuleSpec. ModuleSpec can use it to populate its own is_package if that information is not otherwise available. Still, it will be made optional. One consequence of ModuleSpec is that loader ``__init__`` methods will no longer need to accommodate per-module state. The path-based loaders in importlib take arguments in their ``__init__()`` and have corresponding attributes. However, the need for those values is eliminated by module specs. In addition to executing a module during loading, loaders will still be directly responsible for providing APIs concerning module-related data. Other Changes ============= * The various finders and loaders provided by importlib will be updated to comply with this proposal. * The spec for the ``__main__`` module will reflect how the interpreter was started. For instance, with ``-m`` the spec's name will be that of the run module, while ``__main__.__name__`` will still be "__main__". * We add importlib.find_spec() to mirror importlib.find_loader() (which becomes deprecated). * importlib.reload() is changed to use ModuleSpec.load(). * importlib.reload() will now make use of the per-module import lock. Reference Implementation ======================== A reference implementation will be available at http://bugs.python.org/issue18864. Open Issues ============== \* The impact of this change on pkgutil (and setuptools) needs looking into. It has some generic function-based extensions to PEP 302. These may break if importlib starts wrapping loaders without the tools' knowledge. \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, inspect. For instance, pickle should be updated in the ``__main__`` case to look at ``module.__spec__.name``. \* Impact on some kinds of lazy loading modules. [lazy_import_concerns]_ References ========== .. [ref_files_pep] http://mail.python.org/pipermail/import-sig/2013-August/000658.html .. [import_system_docs] http://docs.python.org/3/reference/import.html .. [cleaner_reload_support] https://mail.python.org/pipermail/import-sig/2013-September/000735.html .. [lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Sep 25 15:05:06 2013 From: brett at python.org (Brett Cannon) Date: Wed, 25 Sep 2013 09:05:06 -0400 Subject: [Import-SIG] latest update of PEP 451 In-Reply-To: References: Message-ID: On Wed, Sep 25, 2013 at 1:46 AM, Eric Snow wrote: > I've updated PEP 451 to address comments and clear a few things up. Most > notably, I added a list of terms at the beginning. > > The PEP is pretty close to done and feedback has simmered down. Does > anyone object to my posting the next update to python-dev? > > There are two main open questions: > > 1. How does ModuleSpec interact with existing import-sensitive modules in > the standard library? > 2. PJE's concerns about reload semantics and lazy loading. > > Regarding the first, I'm not too concerned with the ability to adapt those > modules to ModuleSpec without much effort. However, I will be doing a > thorough check before I'll ask for pronouncement. > > About lazy loading, from what I understand, importlib.reload() broke > backward compatibility with regards to PJE's use case when it switched to > depending on __loader__. Perhaps it was even before that. I'll have to > check. Regardless, PEP 451 does not change the semantic of reload() from > what they currently are. > > The PEP could restore the previous semantics without a lot of work (if > module.__spec__ is not set, then call find_spec(), set it, and reload using > that). However, if reload() backward compatibility got broken somewhere > along the lines, that sounds like a bug that should be addressed separately. > > -eric > > > ===================================================================== > > PEP: 451 > Title: A ModuleSpec Type for the Import System > Version: $Revision$ > Last-Modified: $Date$ > Author: Eric Snow > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Aug-2013 > Python-Version: 3.4 > Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013 > Resolution: > > > Abstract > ======== > > This PEP proposes to add a new class to importlib.machinery called > "ModuleSpec". It will provide all the import-related information used > to load a module and will be available without needing to load the > module first. Finders will directly provide a module's spec instead of > a loader (which they will continue to provide indirectly). The import > machinery will be adjusted to take advantage of module specs, including > using them to load modules. > > > Terms and Concepts > ================== > > The changes in this proposal are an opportunity to make several > existing terms and concepts more clear, whereas currently they are > (unfortunately) ambiguous. New concepts are also introduced in this > proposal. Finally, it's worth explaining a few other existing terms > with which people may not be so familiar. For the sake of context, here > is a brief summary of all three groups of terms and concepts. A more > detailed explanation of the import system is found at > [import_system_docs]_. > > finder > ------ > > A "finder" is an object that identifies the loader that the import > system should use to load a module. Currently this is accomplished by > calling the finder's find_module() method, which returns the loader. > > Finders are strictly responsible for providing the loader, which they do > through their find_module() method. The import system then uses that > loader to load the module. > > loader > ------ > > A "loader" is an object that is used to load a module during import. > Currently this is done by calling the loader's load_module() method. A > loader may also provide APIs for getting information about the modules > it can load, as well as about data from sources associated with such a > module. > > Right now loaders (via load_module()) are responsible for certain > boilerplate import-related operations. These are: > "boilerplate, import-related" > > 1. perform some (module-related) validation; > 2. create the module object; > 3. set import-related attributes on the module; > 4. "register" the module to sys.modules; > 5. exec the module; > 6. clean up in the event of failure while loading the module. > > This all takes place during the import system's call to > Loader.load_module(). > > origin > ------ > > This is a new term and concept. The idea of it exists subtly in the > import system already, but this proposal makes the concept explicit. > > "origin" is the import context means the system (or resource within a > "is the" -> "in an" > system) from which a module originates. For the purposes of this > proposal, "origin" is also a string which identifies such a resource or > system. "origin" is applicable to all modules. > > For example, the origin for built-in and frozen modules is the > interpreter itself. The import system already identifies this origin as > "built-in" and "frozen", respectively. This is demonstrated in the > following module repr: "". > > In fact, the module repr is already a relatively reliable, though > implicit, indicator of a module's origin. Other modules also indicate > their origin through other means, as described in the entry for > "location". > > It is up to the loader to decide on how to interpret and use a module's > origin, if at all. > > location > -------- > > This is a new term. However the concept already exists clearly in the > import system, as associated with the ``__file__`` and ``__path__`` > attributes of modules, as well as the name/term "path" elsewhere. > > A "location" is a resource or "place", rather than a system at large, > from which a module is loaded. It qualifies as an "origin". Examples > of locations include filesystem paths and URLs. A location is > identified by the name of the resource, but may not necessarily identify > the system to which the resource pertains. In such cases the loader > would have to identify the system itself. > > In contrast to other kinds of module origin, a location cannot be > inferred by the loader just by the module name. Instead, the loader > must be provided with a string to identify the location, usually by the > finder that generates the loader. The loader then uses this information > to locate the resource from which it will load the module. In theory > you could load the module at a given location under various names. > > The most common example of locations in the import system are the > files from which source and extension modules are loaded. For these > modules the location is identified by the string in the ``__file__`` > attribute. Although ``__file__`` isn't particularly accurate for some > modules (e.g. zipped), it is currently the only way that the import > system indicates that a module has a location. > > A module that has a location may be called "locatable". > > cache > ----- > > The import system stores compiled modules in the __pycache__ directory > as an optimization. This module cache that we use today was provided by > PEP 3147. For this proposal, the relevant API for module caching is the > ``__cache__`` attribute of modules and the cache_from_source() function > in importlib.util. Loaders are responsible for putting modules into the > cache (and loading out of the cache). Currently the cache is only used > for compiled source modules. However, this proposal explicitly allows > > package > ------- > > The concept does not change, nor does the term. However, the > distinction between modules and packages is mostly superficial. > Packages *are* modules. They simply have a ``__path__`` attribute and > import may add attributes bound to submodules. The typical perceived > "typically" > difference is a source of confusion. This proposal explicitly > de-emphasizes the distinction between packages and modules where it > makes sense to do so. > > > Motivation > ========== > > The import system has evolved over the lifetime of Python. In late 2002 > PEP 302 introduced standardized import hooks via finders and > loaders and sys.meta_path. The importlib module, introduced > with Python 3.1, now exposes a pure Python implementation of the APIs > described by PEP 302, as well as of the full import system. It is now > much easier to understand and extend the import system. While a benefit > to the Python community, this greater accessibilty also presents a > challenge. > > As more developers come to understand and customize the import system, > any weaknesses in the finder and loader APIs will be more impactful. So > the sooner we can address any such weaknesses the import system, the > better...and there are a couple we can take care of with this proposal. > > Firstly, any time the import system needs to save information about a > module we end up with more attributes on module objects that are > generally only meaningful to the import system. It would be nice to > have a per-module namespace in which to put future import-related > information and to pass around within the import system. Secondly, > there's an API void between finders and loaders that causes undue > complexity when encountered. The PEP 420 (namespace packages) > implementation had to work around this. The complexity surfaced again > during recent efforts on a separate proposal. [ref_files_pep]_ > > The `finder`_ and `loader`_ sections above detail current responsibility > of both. Notably, loaders are not required to provide any of the > functionality of their load_module() through other methods. Thus, > though the import-related information about a module is likely available > without loading the module, it is not otherwise exposed. > > Furthermore, the requirements assocated with load_module() are > common to all loaders and mostly are implemented in exactly the same > way. This means every loader has to duplicate the same boilerplate > code. importlib.util provides some tools that help with this, but > it would be more helpful if the import system simply took charge of > these responsibilities. The trouble is that this would limit the degree > of customization that load_module() could easily continue to facilitate. > > More importantly, While a finder *could* provide the information that > the loader's load_module() would need, it currently has no consistent > way to get it to the loader. This is a gap between finders and loaders > which this proposal aims to fill. > > Finally, when the import system calls a finder's find_module(), the > finder makes use of a variety of information about the module that is > useful outside the context of the method. Currently the options are > limited for persisting that per-module information past the method call, > since it only returns the loader. Popular options for this limitation > are to store the information in a module-to-info mapping somewhere on > the finder itself, or store it on the loader. > > Unfortunately, loaders are not required to be module-specific. On top > of that, some of the useful information finders could provide is > common to all finders, so ideally the import system could take care of > those details. This is the same gap as before between finders and > loaders. > > As an example of complexity attributable to this flaw, the > implementation of namespace packages in Python 3.3 (see PEP 420) added > FileFinder.find_loader() because there was no good way for > find_module() to provide the namespace search locations. > > The answer to this gap is a ModuleSpec object that contains the > per-module information and takes care of the boilerplate functionality > involved with loading the module. > > > Specification > ============= > > The goal is to address the gap between finders and loaders while > changing as little of their semantics as possible. Though some > functionality and information is moved to the new ModuleSpec type, > their behavior should remain the same. However, for the sake of clarity > the finder and loader semantics will be explicitly identified. > > Here is a high-level summary of the changes described by this PEP. More > detail is available in later sections. > > importlib.machinery.ModuleSpec (new) > ------------------------------------ > > A specification for a module's import-system-related state. See the > `ModuleSpec`_ section below for a more detailed description. > > * ModuleSpec(name, loader, \*, origin=None, loader_state=None, > is_package=None) > > Attributes: > > * name - a string for the name of the module. > * loader - the loader to use for loading. > * origin - the name of the place from which the module is loaded, > e.g. "builtin" for built-in modules and the filename for modules > loaded from source. > * submodule_search_locations - list of strings for where to find > submodules, if a package (None otherwise). > * loader_state - a container of extra module-specific data for use > during loading. > * cached (property) - a string for where the compiled module should be > stored. > * parent (RO-property) - the name of the package to which the module > belongs as a submodule (or None). > * has_location (RO-property) - a flag indicating whether or not the > module's "origin" attribute refers to a location. > > Instance Methods: > > * module_repr() - provide a repr string for the spec'ed module; > non-locatable modules will use their origin (e.g. "built-in"). > * init_module_attrs(module) - set any of a module's import-related > attributes that aren't already set. > > importlib.util Additions > ------------------------ > > These are ModuleSpec factory functions, meant as a convenience for > finders. See the `Factory Functions`_ section below for more detail. > > * spec_from_file_location(name, location, \*, loader=None, > submodule_search_locations=None) > - build a spec from file-oriented information and loader APIs. > * from_loader(name, loader, \*, origin=None, is_package=None) - build > a spec with missing information filled in by using loader APIs. > > This factory function is useful for some backward-compatibility > situations: > > * spec_from_module(module, loader=None) - build a spec based on the > import-related attributes of an existing module. > > Other API Additions > ------------------- > > * importlib.find_spec(name, path=None) will work exactly the same as > importlib.find_loader() (which it replaces), but return a spec instead > of a loader. > > For loaders: > > * importlib.abc.Loader.exec_module(module) will execute a module in its > own namespace. It replaces importlib.abc.Loader.load_module(), taking > over its module execution functionality. > * importlib.abc.Loader.create_module(spec) (optional) will return the > module to use for loading. > > For modules: > > * Module objects will have a new attribute: ``__spec__``. > > API Changes > ----------- > > * InspectLoader.is_package() will become optional. > > Deprecations > ------------ > > * importlib.abc.MetaPathFinder.find_module() > * importlib.abc.PathEntryFinder.find_module() > * importlib.abc.PathEntryFinder.find_loader() > * importlib.abc.Loader.load_module() > * importlib.abc.Loader.module_repr() > * The parameters and attributes of the various loaders in > importlib.machinery > * importlib.util.set_package() > * importlib.util.set_loader() > * importlib.find_loader() > > Removals > -------- > > These were introduced prior to Python 3.4's release, so they can simply > be removed. > > * importlib.abc.Loader.init_module_attrs() > * importlib.util.module_to_load() > > Other Changes > ------------- > > * The import system implementation in importlib will be changed to make > use of ModuleSpec. > * importlib.reload() will make use of ModuleSpec. > * Import-related module attributes (other than ``__spec__``) will no > longer be used directly by the import system. > Might want to be clear with an example as to what happens to __path__ as that's the one value people do manipulate on occasion in order to see semantic changes. Basically ModuleSpec objects are bundles of data for importing a single module that the ModuleSpec describes, but they don't play a role in other modules' imports, thus not influencing submodules. > * Import-related attributes should no longer be added to modules > directly. > * The module type's ``__repr__()`` will be a thin wrapper around a pure > Python implementation which will leverage ModuleSpec. > * The spec for the ``__main__`` module will reflect the appropriate > name and origin. > > Backward-Compatibility > ---------------------- > > * If a finder does not define find_spec(), a spec is derived from > the loader returned by find_module(). > * PathEntryFinder.find_loader() still takes priority over > find_module(). > * Loader.load_module() is used if exec_module() is not defined. > > What Will not Change? > --------------------- > > * The syntax and semantics of the import statement. > * Existing finders and loaders will continue to work normally. > * The import-related module attributes will still be initialized with > the same information. > * Finders will still create loaders (now storing them in specs). > * Loader.load_module(), if a module defines it, will have all the > same requirements and may still be called directly. > * Loaders will still be responsible for module data APIs. > * importlib.reload() will still overwrite the import-related attributes. > > Responsibilities > ---------------- > > Here's a quick breakdown of where responsibilities lie after this PEP. > > finders: > > * create loader > * create spec > > loaders: > > * create module (optional) > * execute module > > ModuleSpec: > > * orchestrate module loading > * boilerplate for module loading, including managing sys.modules and > setting import-related attributes > * create module if loader doesn't > * call loader.exec_module(), passing in the module in which to exec > * contain all the information the loader needs to exec the module > * provide the repr for modules > > > What Will Existing Finders and Loaders Have to Do Differently? > ============================================================== > > Immediately? Nothing. The status quo will be deprecated, but will > continue working. However, here are the things that the authors of > finders and loaders should change relative to this PEP: > > * Implement find_spec() on finders. > * Implement exec_module() on loaders, if possible. > > The ModuleSpec factory functions in importlib.util are intended to be > helpful for converting existing finders. from_loader() and > from_file_location() are both straight-forward utilities in this > regard. In the case where loaders already expose methods for creating > and preparing modules, ModuleSpec.from_module() may be useful to > the corresponding finder. > > For existing loaders, exec_module() should be a relatively direct > conversion from the non-boilerplate portion of load_module(). In some > uncommon cases the loader should also implement create_module(). > > > ModuleSpec Users > ================ > > ModuleSpec objects have 3 distinct target audiences: Python itself, > import hooks, and normal Python users. > > Python will use specs in the import machinery, in interpreter startup, > and in various standard library modules. Some modules are > import-oriented, like pkgutil, and others are not, like pickle and > pydoc. In all cases, the full ModuleSpec API will get used. > > Import hooks (finders and loaders) will make use of the spec in specific > ways. First of all, finders may use the spec factory functions in > importlib.util to create spec objects. They may also directly adjust > the spec attributes after the spec is created. Secondly, the finder may > bind additional information to the spec (in finder_extras) for the > loader to consume during module creation/execution. Finally, loaders > will make use of the attributes on a spec when creating and/or executing > a module. > > Python users will be able to inspect a module's ``__spec__`` to get > import-related information about the object. Generally, Python > applications and interactive users will not be using the ``ModuleSpec`` > factory functions nor any the instance methods. > > > How Loading Will Work > ===================== > > This is an outline of what happens in ModuleSpec's loading > functionality:: > > def load(spec): > if not hasattr(spec.loader, 'exec_module'): > module = spec.loader.load_module(spec.name) > spec.init_module_attrs(module) > return sys.modules[spec.name] > > module = None > if hasattr(spec.loader, 'create_module'): > module = spec.loader.create_module(spec) > if module is None: > module = ModuleType(spec.name) > spec.init_module_attrs(module) > > sys.modues[spec.name] = module > try: > spec.loader.exec_module(module) > except Exception: > del sys.modules[spec.name] > raise > return sys.modules[spec.name] > try: spec.loader.exec_module(module) except *BaseException*: *try:* del sys.modules[spec.name] *except KeyError:* * pass* > > These steps are exactly what Loader.load_module() is already > expected to do. Loaders will thus be simplified since they will only > need to implement exec_module(). > > Note that we must return the module from sys.modules. During loading > the module may have replaced itself in sys.modules. Since we don't have > a post-import hook API to accommodate the use case, we have to deal with > it. However, in the replacement case we do not worry about setting the > import-related module attributes on the object. The module writer is on > their own if they are doing this. > > > ModuleSpec > ========== > > Attributes > ---------- > > Each of the following names is an attribute on ModuleSpec objects. A > value of None indicates "not set". This contrasts with module > objects where the attribute simply doesn't exist. Most of the > attributes correspond to the import-related attributes of modules. Here > is the mapping. The reverse of this mapping is used by > ModuleSpec.init_module_attrs(). > > ========================== ============== > On ModuleSpec On Modules > ========================== ============== > name __name__ > loader __loader__ > package __package__ > origin __file__* > cached __cached__*,** > submodule_search_locations __path__** > loader_state \- > has_location \- > ========================== ============== > > | \* Set on the module only if spec.has_location is true. > | \*\* Set on the module only if the spec attribute is not None. > > While package and has_location are read-only properties, the remaining > attributes can be replaced after the module spec is created and even > after import is complete. This allows for unusual cases where directly > modifying the spec is the best option. However, typical use should not > involve changing the state of a module's spec. > > **origin** > > "origin" is a string for the name of the place from which the module > originates. See `origin`_ above. Aside from the informational value, > it is also used in module_repr(). In the case of a spec where > "has_location" is true, ``__file__`` is set to the value of "origin". > For built-in modules "origin" would be set to "built-in". > > **has_location** > > As explained in the `location`_ section above, many modules are > "locatable", meaning there is a corresponding resource from which the > module will be loaded and that resource can be described by a string. > In contrast, non-locatable modules can't be loaded in this fashion, e.g. > builtin modules and modules dynamically created in code. For these, the > name is the only way to access them, so they have an "origin" but not a > "location". > > "has_location" is true if the module is locatable. In that case the > spec's origin is used as the location and ``__file__`` is set to > spec.origin. If additional location information is required (e.g. > zipimport), that information may be stored in spec.loader_state. > > "has_location" may be implied from the existence of a load_data() method > on the loader. > > Incidently, not all locatable modules will be cachable, but most will. > > **submodule_search_locations** > > The list of location strings, typically directory paths, in which to > search for submodules. If the module is a package this will be set to > a list (even an empty one). Otherwise it is None. > > The name of the corresponding module attribute, ``__path__``, is > relatively ambiguous. Instead of mirroring it, we use a more explicit > name that makes the purpose clear. > > **loader_state** > > A finder may set loader_state to any value to provide additional > data for the loader to use during loading. A value of None is the > default and indicates that there is no additional data. Otherwise it > can be set to any object, such as a dict, list, or > types.SimpleNamespace, containing the relevant extra information. > > For example, zipimporter could use it to pass the zip archive name > to the loader directly, rather than needing to derive it from origin > or create a custom loader for each find operation. > > loader_state is meant for use by the finder and corresponding loader. > It is not guaranteed to be a stable resource for any other use. > > Factory Functions > ----------------- > > **spec_from_file_location(name, location, \*, loader=None, > submodule_search_locations=None)** > > Build a spec from file-oriented information and loader APIs. > > * "origin" will be set to the location. > * "has_location" will be set to True. > * "cached" will be set to the result of calling cache_from_source(). > > * "origin" can be deduced from loader.get_filename() (if "location" is > not passed in. > * "loader" can be deduced from suffix if the location is a filename. > * "submodule_search_locations" can be deduced from loader.is_package() > and from os.path.dirname(location) if locatin is a filename. > > **from_loader(name, loader, \*, origin=None, is_package=None)** > > Build a spec with missing information filled in by using loader APIs. > > * "has_location" can be deduced from loader.get_data. > * "origin" can be deduced from loader.get_filename(). > * "submodule_search_locations" can be deduced from loader.is_package() > and from os.path.dirname(location) if locatin is a filename. > > **spec_from_module(module, loader=None)** > > Build a spec based on the import-related attributes of an existing > module. The spec attributes are set to the corresponding import- > related module attributes. See the table in `Attributes`_. > > Omitted Attributes and Methods > ------------------------------ > > The following ModuleSpec methods are not part of the public API since > it is easy to use them incorrectly and only the import system really > needs them (i.e. they would be an attractive nuisance). > > * _create() - provide a new module to use for loading. > * _exec(module) - execute the spec into a module namespace. > * _load() - prepare a module and execute it in a protected way. > * _reload(module) - re-execute a module in a protected way. > Do these really need to be documented as not part of the API? They have leading underscores and so as per PEP 8 they are implicitly not part of the public API. They then just feel like noise and something that should not be explained as part of the specification. > > Here are other omissions: > > There is no "PathModuleSpec" subclass of ModuleSpec that separates out > has_location, cached, and submodule_search_locations. While that might > make the separation cleaner, module objects don't have that distinction. > ModuleSpec will support both cases equally well. > > While "is_package" would be a simple additional attribute (aliasing > self.submodule_search_locations is not None), it perpetuates the > artificial (and mostly erroneous) distinction between modules and > packages. > > Conceivably, a ModuleSpec.load() method could optionally take a list of > modules with which to interact instead of sys.modules. That > capability is left out of this PEP, but may be pursued separately at > some other time, including relative to PEP 406 (import engine). > > Likewise load() could be leveraged to implement multi-version > imports. While interesting, doing so is outside the scope of this > proposal. > > Others: > > * Add ModuleSpec.submodules (RO-property) - returns possible submodules > relative to the spec. > * Add ModuleSpec.loaded (RO-property) - the module in sys.module, if > any. > * Add ModuleSpec.data - a descriptor that wraps the data API of the > spec's loader. > * Also see [cleaner_reload_support]_. > > > Backward Compatibility > ---------------------- > > ModuleSpec doesn't have any. This would be a different story if > Finder.find_module() were to return a module spec instead of loader. > In that case, specs would have to act like the loader that would have > been returned instead. Doing so would be relatively simple, but is an > unnecessary complication. It was part of earlier versions of this PEP. > > Subclassing > ----------- > > Subclasses of ModuleSpec are allowed, but should not be necessary. > Simply setting loader_state or adding functionality to a custom > finder or loader will likely be a better fit and should be tried first. > However, as long as a subclass still fulfills the requirements of the > import system, objects of that type are completely fine as the return > value of Finder.find_spec(). > > > Existing Types > ============== > > Module Objects > -------------- > > Other than adding ``__spec__``, none of the import-related module > attributes will be changed or deprecated, though some of them could be; > any such deprecation can wait until Python 4. > > A module's spec will not be kept in sync with the corresponding import- > related attributes. Though they may differ, in practice they will > typically be the same. > > One notable exception is that case where a module is run as a script by > using the ``-m`` flag. In that case ``module.__spec__.name`` will > reflect the actual module name while ``module.__name__`` will be > ``__main__``. > > A module's spec is not guaranteed to be identical between two modules > with the same name. Likewise there is no guarantee that successive > calls to importlib.find_spec() will return the same object or even an > equivalent object, though at least the latter is likely. > > Finders > ------- > > Finders are still responsible for creating the loader. That loader will > now be stored in the module spec returned by find_spec() rather > than returned directly. As is currently the case without the PEP, if a > loader would be costly to create, that loader can be designed to defer > the cost until later. > > **MetaPathFinder.find_spec(name, path=None)** > > **PathEntryFinder.find_spec(name)** > > Finders will return ModuleSpec objects when find_spec() is > called. This new method replaces find_module() and > find_loader() (in the PathEntryFinder case). If a loader does > not have find_spec(), find_module() and find_loader() are > used instead, for backward-compatibility. > > Adding yet another similar method to loaders is a case of practicality. > find_module() could be changed to return specs instead of loaders. > This is tempting because the import APIs have suffered enough, > especially considering PathEntryFinder.find_loader() was just > added in Python 3.3. However, the extra complexity and a less-than- > explicit method name aren't worth it. > > Loaders > ------- > > **Loader.exec_module(module)** > > Loaders will have a new method, exec_module(). Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. > exec_module() will be used during both loading and reloading. > > exec_module() should properly handle the case where it is called more > than once. For some kinds of modules this may mean raising ImportError > every time after the first time the method is called. This is > particularly relevant for reloading, where some kinds of modules do not > support in-place reloading. > > **Loader.create_module(spec)** > > Loaders may also implement create_module() that will return a > new module to exec. It may return None to indicate that the default > module creation code should be used. One use case, though atypical, for > create_module() is to provide a module that is a subclass of the builtin > module type. Most loaders will not need to implement create_module(), > > create_module() should properly handle the case where it is called more > than once for the same spec/module. This may include returning None or > raising ImportError. > > .. note:: > > exec_module() and create_module() should not set any import-related > module attributes. The fact that load_module() does is a design flaw > that this proposal aims to correct. > > Other changes: > > PEP 420 introduced the optional module_repr() loader method to limit > the amount of special-casing in the module type's ``__repr__()``. Since > this method is part of ModuleSpec, it will be deprecated on loaders. > However, if it exists on a loader it will be used exclusively. > > Loader.init_module_attr() method, added prior to Python 3.4's > release , will be removed in favor of the same method on ModuleSpec. > > However, InspectLoader.is_package() will not be deprecated even > though the same information is found on ModuleSpec. ModuleSpec > can use it to populate its own is_package if that information is > not otherwise available. Still, it will be made optional. > > One consequence of ModuleSpec is that loader ``__init__`` methods will > no longer need to accommodate per-module state. The path-based loaders > in importlib take arguments in their ``__init__()`` and have > corresponding attributes. However, the need for those values is > eliminated by module specs. > > In addition to executing a module during loading, loaders will still be > directly responsible for providing APIs concerning module-related data. > > > Other Changes > ============= > > * The various finders and loaders provided by importlib will be > updated to comply with this proposal. > * The spec for the ``__main__`` module will reflect how the interpreter > was started. For instance, with ``-m`` the spec's name will be that > of the run module, while ``__main__.__name__`` will still be > "__main__". > * We add importlib.find_spec() to mirror > importlib.find_loader() (which becomes deprecated). > * importlib.reload() is changed to use ModuleSpec.load(). > * importlib.reload() will now make use of the per-module import > lock. > > > Reference Implementation > ======================== > > A reference implementation will be available at > http://bugs.python.org/issue18864. > > > Open Issues > ============== > > \* The impact of this change on pkgutil (and setuptools) needs looking > into. It has some generic function-based extensions to PEP 302. These > may break if importlib starts wrapping loaders without the tools' > knowledge. > > \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, > inspect. > > For instance, pickle should be updated in the ``__main__`` case to look > at ``module.__spec__.name``. > > \* Impact on some kinds of lazy loading modules. [lazy_import_concerns]_ > > > References > ========== > > .. [ref_files_pep] > http://mail.python.org/pipermail/import-sig/2013-August/000658.html > > .. [import_system_docs] http://docs.python.org/3/reference/import.html > > .. [cleaner_reload_support] > https://mail.python.org/pipermail/import-sig/2013-September/000735.html > > .. [lazy_import_concerns] > https://mail.python.org/pipermail/python-dev/2013-August/128129.html > I should mention that this PEP will actually improve the situation for lazy loading compared to how it is in Python 3.3 when using __getattribute__. Because import now tries to backfill attributes like __package__ and __loader__, any module that is lazy based on attribute access automatically gets loaded by import itself. But with this PEP we can change import's semantics to not do that with spec-loaded modules and thus loader.exec_module() can insert a lazy module into sys.modules and know that it's attributes won't be touched unless you do a ``from ... import`` on it. -Brett > > > Copyright > ========= > > This document has been placed in the public domain. > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Sep 26 21:02:22 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 26 Sep 2013 13:02:22 -0600 Subject: [Import-SIG] latest update of PEP 451 In-Reply-To: References: Message-ID: On Sep 25, 2013 7:05 AM, "Brett Cannon" wrote: > On Wed, Sep 25, 2013 at 1:46 AM, Eric Snow wrote: >> * Import-related module attributes (other than ``__spec__``) will no >> longer be used directly by the import system. > > > Might want to be clear with an example as to what happens to __path__ as that's the one value people do manipulate on occasion in order to see semantic changes. Basically ModuleSpec objects are bundles of data for importing a single module that the ModuleSpec describes, but they don't play a role in other modules' imports, thus not influencing submodules. Good point. >> How Loading Will Work >> ===================== >> >> This is an outline of what happens in ModuleSpec's loading >> functionality:: >> >> def load(spec): >> if not hasattr(spec.loader, 'exec_module'): >> module = spec.loader.load_module(spec.name) >> spec.init_module_attrs(module) >> return sys.modules[spec.name] >> >> module = None >> if hasattr(spec.loader, 'create_module'): >> module = spec.loader.create_module(spec) >> if module is None: >> module = ModuleType(spec.name) >> spec.init_module_attrs(module) >> >> sys.modues[spec.name] = module >> try: >> spec.loader.exec_module(module) >> except Exception: >> del sys.modules[spec.name] >> raise >> return sys.modules[spec.name] > > > try: > spec.loader.exec_module(module) > except BaseException: > try: > del sys.modules[spec.name] > except KeyError: > pass Fair enough. >> The following ModuleSpec methods are not part of the public API since >> it is easy to use them incorrectly and only the import system really >> needs them (i.e. they would be an attractive nuisance). >> >> * _create() - provide a new module to use for loading. >> * _exec(module) - execute the spec into a module namespace. >> * _load() - prepare a module and execute it in a protected way. >> * _reload(module) - re-execute a module in a protected way. > > > Do these really need to be documented as not part of the API? They have leading underscores and so as per PEP 8 they are implicitly not part of the public API. They then just feel like noise and something that should not be explained as part of the specification. I'm fine with removing them from the PEP. >> .. [lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html > > > I should mention that this PEP will actually improve the situation for lazy loading compared to how it is in Python 3.3 when using __getattribute__. Because import now tries to backfill attributes like __package__ and __loader__, any module that is lazy based on attribute access automatically gets loaded by import itself. But with this PEP we can change import's semantics to not do that with spec-loaded modules and thus loader.exec_module() can insert a lazy module into sys.modules and know that it's attributes won't be touched unless you do a ``from ... import`` on it. Yeah, I'm mostly focused on addressing concerns. Would a lazy load example be worth adding to the PEP? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Sep 26 21:15:16 2013 From: brett at python.org (Brett Cannon) Date: Thu, 26 Sep 2013 15:15:16 -0400 Subject: [Import-SIG] latest update of PEP 451 In-Reply-To: References: Message-ID: On Thu, Sep 26, 2013 at 3:02 PM, Eric Snow wrote: > On Sep 25, 2013 7:05 AM, "Brett Cannon" wrote: > > On Wed, Sep 25, 2013 at 1:46 AM, Eric Snow > wrote: > [SNIP] > >> .. [lazy_import_concerns] > https://mail.python.org/pipermail/python-dev/2013-August/128129.html > > > > > > I should mention that this PEP will actually improve the situation for > lazy loading compared to how it is in Python 3.3 when using > __getattribute__. Because import now tries to backfill attributes like > __package__ and __loader__, any module that is lazy based on attribute > access automatically gets loaded by import itself. But with this PEP we can > change import's semantics to not do that with spec-loaded modules and thus > loader.exec_module() can insert a lazy module into sys.modules and know > that it's attributes won't be touched unless you do a ``from ... import`` > on it. > > Yeah, I'm mostly focused on addressing concerns. Would a lazy load > example be worth adding to the PEP? > If it you don't think it will detract from the rest of the PEP it wouldn't hurt. If you want I can write up some rough code to demonstrate how it would work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Sep 27 02:37:33 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 26 Sep 2013 18:37:33 -0600 Subject: [Import-SIG] latest update of PEP 451 In-Reply-To: References: Message-ID: On Thu, Sep 26, 2013 at 1:15 PM, Brett Cannon wrote: > On Thu, Sep 26, 2013 at 3:02 PM, Eric Snow > wrote: >> On Sep 25, 2013 7:05 AM, "Brett Cannon" wrote: >> > On Wed, Sep 25, 2013 at 1:46 AM, Eric Snow >> > wrote: > > > [SNIP] >> >> >> .. [lazy_import_concerns] >> >> https://mail.python.org/pipermail/python-dev/2013-August/128129.html >> > >> > >> > I should mention that this PEP will actually improve the situation for >> > lazy loading compared to how it is in Python 3.3 when using >> > __getattribute__. Because import now tries to backfill attributes like >> > __package__ and __loader__, any module that is lazy based on attribute >> > access automatically gets loaded by import itself. But with this PEP we can >> > change import's semantics to not do that with spec-loaded modules and thus >> > loader.exec_module() can insert a lazy module into sys.modules and know that >> > it's attributes won't be touched unless you do a ``from ... import`` on it. >> >> Yeah, I'm mostly focused on addressing concerns. Would a lazy load >> example be worth adding to the PEP? > > > If it you don't think it will detract from the rest of the PEP it wouldn't > hurt. If you want I can write up some rough code to demonstrate how it would > work. Meh. I'll add it if needed, but will hold off otherwise. I don't think it will add that much to the PEP. It's not that different from what you can already do with the loader. -eric