From brett at python.org Thu Aug 1 15:18:33 2013 From: brett at python.org (Brett Cannon) Date: Thu, 1 Aug 2013 09:18:33 -0400 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: [SNIP to stop Mailman from holding up the email for moderation because of size] A Module Attribute to Expose Contributing Ref Files >>> --------------------------------------------- >>> >>> Knowing the origin of a module is important when tracking down problems, >>> particularly import-related ones. Currently, that entails looking at >>> `.__file__` and `.__path__` (or `sys.path`). >>> >>> With this PEP there can be a chain of ref files in between the currently >>> available path and a module's __file__. Having access to that list of ref >>> files is important in order to determine why one file was selected over >>> another as the origin for the module. When an unexpected file gets used >>> for one of your imports, you'll care about this! >>> >>> In order to facilitate that, modules will have a new attribute: >>> `__indirect__`. It will be a tuple comprised of the chain of ref files, in >>> order, used to locate the module's __file__. An empty tuple or with one >>> item will be the most common case. An empty tuple indicates that no ref >>> files were used to locate the module. >>> >> >> This complicates things even further. How are you going to pass this info >> along a call chain through find_loader()? Are we going to have to add >> find_loader3() to support this (nasty side-effect of using tuples instead >> of types.SimpleNamespace for the return value)? Some magic second value or >> type from find_loader() which flags the values in the iterable are from a >> .ref file and not any other possible place? This requires an API change and >> there isn't any mention of how that would look or work. >> > > This is the big open question in my mind. I suppose having find_loader() > return a SimpleNamespace would help. Then the indirect path we aggregate > in find_loader() could be passed as a new argument to loaders (when > instantiated in either FileFinder.find_loader() or in > PathFinder.find_module(). > > Here are the options I see, some more realistic than others: > > 1. Build __indirect__ after the fact (in init_module_attrs()?). > 2. Change FileFinder.find_loader() to return a types.SimpleNamespace > instance. > You can't do that; it would change the method signature in a way that would break code automatically unpacking the tuple. I was lamenting the fact that no one thought to use types.SimpleNamespace in the first place, not suggesting it now be used. > 3. Change FileFinder.find_loader() to return a namedtuple subclass with an > extra "loader" attribute. > Only if it also subclassed dict or types.SimpleNamespace so that it could be documented that in Python 4 the tuple usage will be removed but the new, alternative access approach would continue to work. Plus something in importlib.util to help construct this monstrosity of an object so people future-proof their code. > 4. Piggy-back the indirect path on the loader returned by > FileFinder.find_loader() in an "_indirect" attribute (or in the loader spot > in the case of namespace packages). > Doesn't that tie this very tightly to FileFinder and not allowing alernative finders to participate? > 5. Something along the lines of Nick's IndirectReference. > I would avoid having to do any change that requires an isinstance check. New code can use getattr() (like with a namedtuple hybrid) w/o issue since that's a common way of dealing with API expansion, but having changing types in the return is something even Guido has said he doesn't care for. 6. Wrap the loader in a proxy that also sets __indirect__ when > load_module() is called. > Ew. > 7. Totally refactor the import system so that ModuleSpec objects are > passed to metapath finders rather than (name, path) and simply store the > indirect path on the spec (which is used directly to load the module rather > than the loader). > > Yeah, that ain't going to happen for backwards-compatibility reasons unless you're ready to make this new API work in a fully compatible way with the current API. > 4 feels too much like a hack, particularly when we have other options. 7 > would need a PEP of its own (forthcoming ). > >> > I see 2 as the best one. Is it really too late to change the return type > of FileFinder.find_loader()? If we simply can't bear the backward > compatibility risk (no matter how small ), > We unfortunately can't. It would require a new method which as a stub would call the old API to return the proper object (which is fine if you can come up with a reasonable name). > I'd advocate for one of 1, 3, 5, or 6. > I would try 1 and 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 1 16:00:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Aug 2013 00:00:24 +1000 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On 1 August 2013 23:18, Brett Cannon wrote: >> I see 2 as the best one. Is it really too late to change the return type >> of FileFinder.find_loader()? If we simply can't bear the backward >> compatibility risk (no matter how small ), > > We unfortunately can't. It would require a new method which as a stub would > call the old API to return the proper object (which is fine if you can come > up with a reasonable name). Just musing on this one for a bit. 1. We still have the silliness where we call "find_module" on metapath importers to ask them for a loader. 2. We have defined an inflexible signature for find_loader on path entry finders (oops) 3. There's other interesting metadata finders could expose *without* loading the module So, how does this sound: add a new API called "find_module_info" for both metapath importers and path entry finders (falling back to the legacy APIs). This would return a simple namespace potentially providing the following pieces of information, using the same rules as the corresponding loader does for setting the module attributes (http://docs.python.org/3/reference/import.html#loaders): __loader__ __name__ __package__ __path__ __file__ __cached__ __indirect__ (We could also lose the double underscores for the namespace attributes, but I quite like the symmetry of keeping them) Thoughts? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Aug 1 16:36:20 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 1 Aug 2013 08:36:20 -0600 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Aug 1, 2013 8:00 AM, "Nick Coghlan" wrote: > > On 1 August 2013 23:18, Brett Cannon wrote: > >> I see 2 as the best one. Is it really too late to change the return type > >> of FileFinder.find_loader()? If we simply can't bear the backward > >> compatibility risk (no matter how small ), > > > > We unfortunately can't. It would require a new method which as a stub would > > call the old API to return the proper object (which is fine if you can come > > up with a reasonable name). > > Just musing on this one for a bit. > > 1. We still have the silliness where we call "find_module" on metapath > importers to ask them for a loader. > 2. We have defined an inflexible signature for find_loader on path > entry finders (oops) > 3. There's other interesting metadata finders could expose *without* > loading the module > > So, how does this sound: add a new API called "find_module_info" for > both metapath importers and path entry finders (falling back to the > legacy APIs). This would return a simple namespace potentially > providing the following pieces of information, using the same rules as > the corresponding loader does for setting the module attributes > (http://docs.python.org/3/reference/import.html#loaders): > > __loader__ > __name__ > __package__ > __path__ > __file__ > __cached__ > __indirect__ > > (We could also lose the double underscores for the namespace > attributes, but I quite like the symmetry of keeping them) > > Thoughts? This is basically what I've been thinking of as a new ModuleSpec type, though with some methods as well. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Aug 1 16:35:57 2013 From: brett at python.org (Brett Cannon) Date: Thu, 1 Aug 2013 10:35:57 -0400 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Thu, Aug 1, 2013 at 10:00 AM, Nick Coghlan wrote: > On 1 August 2013 23:18, Brett Cannon wrote: > >> I see 2 as the best one. Is it really too late to change the return > type > >> of FileFinder.find_loader()? If we simply can't bear the backward > >> compatibility risk (no matter how small ), > > > > We unfortunately can't. It would require a new method which as a stub > would > > call the old API to return the proper object (which is fine if you can > come > > up with a reasonable name). > > Just musing on this one for a bit. > > 1. We still have the silliness where we call "find_module" on metapath > importers to ask them for a loader. > 2. We have defined an inflexible signature for find_loader on path > entry finders (oops) > 3. There's other interesting metadata finders could expose *without* > loading the module > > So, how does this sound: add a new API called "find_module_info" for > both metapath importers and path entry finders (falling back to the > legacy APIs). This would return a simple namespace potentially > providing the following pieces of information, using the same rules as > the corresponding loader does for setting the module attributes > (http://docs.python.org/3/reference/import.html#loaders): > > __loader__ > __name__ > __package__ > __path__ > __file__ > __cached__ > __indirect__ > > (We could also lose the double underscores for the namespace > attributes, but I quite like the symmetry of keeping them) > > Thoughts? If you're going to do that, why stop at types.SimpleNamespace and not move all the way to a module object? Then you can simply start moving to APIs which take the module object to be operated on and the various methods in the loader, etc. and just fill in details as necessary; that's what I would do if I got to redesign the loader API today since it would simplify load_module() and almost everything would just become a static method which set the attribute on the module (e.g. ExecutionLoader.get_filename('some.module') would become ExecutionLoader.filename(module) or even ExecutionLoader.__file__(module) which gets really meta as you can then have a decorator which checks for a non-None value for that attribute on the module and then returns it as a short-circuit instead of calling the method). Only drawback I see is it not being easy to tell if a module has been initialized or not, but I don't view that as a critical issue. IOW introduce new_module()/fresh_module(). Even if types.SimpleNamespace is kept I do like the idea. Loaders could shift to working only off of the object and have their __init__ method standardized to take a single argument so what import is told about and what loaders work with is the same. Basically it becomes a caching mechanism of what finders can infer so that loaders can save themselves the hassle without complicated init call signatures. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Aug 1 16:44:28 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 1 Aug 2013 08:44:28 -0600 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Aug 1, 2013 8:36 AM, "Brett Cannon" wrote: > If you're going to do that, why stop at types.SimpleNamespace and not move all the way to a module object? Then you can simply start moving to APIs which take the module object to be operated on and the various methods in the loader, etc. and just fill in details as necessary; that's what I would do if I got to redesign the loader API today since it would simplify load_module() and almost everything would just become a static method which set the attribute on the module (e.g. ExecutionLoader.get_filename('some.module') would become ExecutionLoader.filename(module) or even ExecutionLoader.__file__(module) which gets really meta as you can then have a decorator which checks for a non-None value for that attribute on the module and then returns it as a short-circuit instead of calling the method). Only drawback I see is it not being easy to tell if a module has been initialized or not, but I don't view that as a critical issue. IOW introduce new_module()/fresh_module(). > > Even if types.SimpleNamespace is kept I do like the idea. Loaders could shift to working only off of the object and have their __init__ method standardized to take a single argument so what import is told about and what loaders work with is the same. Basically it becomes a caching mechanism of what finders can infer so that loaders can save themselves the hassle without complicated init call signatures. This is pretty much exactly what I've been thinking about since PyCon. The only difference is that I have a distinct ModuleSpec class and modules would get a new __spec__ attribute. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 2 02:56:41 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Aug 2013 10:56:41 +1000 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On 2 Aug 2013 00:44, "Eric Snow" wrote: > > > On Aug 1, 2013 8:36 AM, "Brett Cannon" wrote: > > If you're going to do that, why stop at types.SimpleNamespace and not move all the way to a module object? Then you can simply start moving to APIs which take the module object to be operated on and the various methods in the loader, etc. and just fill in details as necessary; that's what I would do if I got to redesign the loader API today since it would simplify load_module() and almost everything would just become a static method which set the attribute on the module (e.g. ExecutionLoader.get_filename('some.module') would become ExecutionLoader.filename(module) or even ExecutionLoader.__file__(module) which gets really meta as you can then have a decorator which checks for a non-None value for that attribute on the module and then returns it as a short-circuit instead of calling the method). Only drawback I see is it not being easy to tell if a module has been initialized or not, but I don't view that as a critical issue. IOW introduce new_module()/fresh_module(). > > > > Even if types.SimpleNamespace is kept I do like the idea. Loaders could shift to working only off of the object and have their __init__ method standardized to take a single argument so what import is told about and what loaders work with is the same. Basically it becomes a caching mechanism of what finders can infer so that loaders can save themselves the hassle without complicated init call signatures. > > This is pretty much exactly what I've been thinking about since PyCon. The only difference is that I have a distinct ModuleSpec class and modules would get a new __spec__ attribute. And we can quit adding ever more magic attributes directly to the module namespace. I like it. With that model, things might look vaguely like: 1. Finders would optionally offer "get_module_spec" (although a better name would be nice!) 2. Specs would have a load() method for the import system to call that optionally accepted an existing module object (this would then cover reload). 3. The responsibility for checking the sys.modules cache would move to the import system. 4. We'd create a "SpecLoader" to offer backwards compatibility in the old __loader__ attribute. Slight(!) tangent from the original problem, but a worthwhile refactoring issue to tackle, I think :) Cheers, Nick. > > -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 2 05:34:38 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 1 Aug 2013 21:34:38 -0600 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Thu, Aug 1, 2013 at 6:56 PM, Nick Coghlan wrote: > > On 2 Aug 2013 00:44, "Eric Snow" wrote: > > This is pretty much exactly what I've been thinking about since PyCon. > The only difference is that I have a distinct ModuleSpec class and modules > would get a new __spec__ attribute. > > And we can quit adding ever more magic attributes directly to the module > namespace. I like it. > Yeah, that was part of what lead me to the idea. This could be taken to some pretty great lengths (I've given it a lot of thought), but I'm trying hard to not do too much at once. I wasn't even planning on pursuing ModuleSpec until 3.5, much less any of my more drastic ideas. > With that model, things might look vaguely like: > > 1. Finders would optionally offer "get_module_spec" (although a better > name would be nice!) > How about "find_module"? <.5 wink> Actually, I'm pretty sure this can be done in a backward-compatible way (in not too much time I've roughed out an implementation that should work). I would rather not introduce more API to the import system, but if that's preferable to hijacking (or improving ) find_module() then I can live with that. However, given the crowd that takes advantage of the import system APIs, I wouldn't consider the change disruptive as long as it's backward compatible. This would also allow us to deprecate PathEntryFinder.get_loader() which we wouldn't have needed if we'd had something like ModuleSpec. > 2. Specs would have a load() method for the import system to call that > optionally accepted an existing module object (this would then cover > reload). > That's been my plan from the get-go. Good call on the reload case. > 3. The responsibility for checking the sys.modules cache would move to the > import system. > To me it makes sense to go even further. ModuleSpec could easily take over a bunch of the responsibilities of loaders, particularly related to the management of the module objects. Also, Loader.init_module_attrs() and importlib.util.module_to_load() could be pulled before the 3.4 release (since they are new in 3.4). It would stink if we found we no longer needed them after they get locked in by the release. Note, however, that they can co-exist with ModuleSpec just fine so it's not as big a deal. > 4. We'd create a "SpecLoader" to offer backwards compatibility in the old > __loader__ attribute. > Interesting. I had anticipated loaders still sticking around, still exposed by module.__loader__ and filling most of their current role, especially with regard to the optional PEP 302 APIs. I suppose we could deprecate the __loader__ attribute, and maybe even __package__, in favor of __spec__, but I don't think there's any rush to do so before Python 4000. > Slight(!) tangent from the original problem, but a worthwhile refactoring > issue to tackle, I think :) > Yeah, even if it proves too big a change for 3.4 and we take some other approach for indirections, I think there's a lot to gain from separating the module specification from the module and from the loader. I've attached a patch that does the bare minimum of what I think we'd want from ModuleSpec. I'll probably flesh out more of my ideas for it later. Of course, I don't want anything here to get in the way of the .ref PEP which I think has more concrete value. So if this tangent threatens any chance at getting indirection files for 3.4, I'd rather defer any effort on these extras until 3.5 in favor of a simpler (if less desirable) approach. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: modulespec.diff Type: application/octet-stream Size: 9849 bytes Desc: not available URL: From ncoghlan at gmail.com Fri Aug 2 11:32:45 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Aug 2013 19:32:45 +1000 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On 2 Aug 2013 13:34, "Eric Snow" wrote: > > > > > On Thu, Aug 1, 2013 at 6:56 PM, Nick Coghlan wrote: >> >> >> On 2 Aug 2013 00:44, "Eric Snow" wrote: >> > This is pretty much exactly what I've been thinking about since PyCon. The only difference is that I have a distinct ModuleSpec class and modules would get a new __spec__ attribute. >> >> And we can quit adding ever more magic attributes directly to the module namespace. I like it. > > Yeah, that was part of what lead me to the idea. This could be taken to some pretty great lengths (I've given it a lot of thought), but I'm trying hard to not do too much at once. I wasn't even planning on pursuing ModuleSpec until 3.5, much less any of my more drastic ideas. >> >> With that model, things might look vaguely like: >> >> 1. Finders would optionally offer "get_module_spec" (although a better name would be nice!) > > How about "find_module"? <.5 wink> Actually, I'm pretty sure this can be done in a backward-compatible way (in not too much time I've roughed out an implementation that should work). I would rather not introduce more API to the import system, but if that's preferable to hijacking (or improving ) find_module() then I can live with that. However, given the crowd that takes advantage of the import system APIs, I wouldn't consider the change disruptive as long as it's backward compatible. > > This would also allow us to deprecate PathEntryFinder.get_loader() which we wouldn't have needed if we'd had something like ModuleSpec. If you can make find_module handle this in a backwards compatible way, cool :) >> >> 2. Specs would have a load() method for the import system to call that optionally accepted an existing module object (this would then cover reload). > > That's been my plan from the get-go. Good call on the reload case. >> >> 3. The responsibility for checking the sys.modules cache would move to the import system. > > To me it makes sense to go even further. ModuleSpec could easily take over a bunch of the responsibilities of loaders, particularly related to the management of the module objects. > > Also, Loader.init_module_attrs() and importlib.util.module_to_load() could be pulled before the 3.4 release (since they are new in 3.4). It would stink if we found we no longer needed them after they get locked in by the release. Note, however, that they can co-exist with ModuleSpec just fine so it's not as big a deal. >> >> 4. We'd create a "SpecLoader" to offer backwards compatibility in the old __loader__ attribute. > > Interesting. I had anticipated loaders still sticking around, still exposed by module.__loader__ and filling most of their current role, especially with regard to the optional PEP 302 APIs. I suppose we could deprecate the __loader__ attribute, and maybe even __package__, in favor of __spec__, but I don't think there's any rush to do so before Python 4000. I was thinking of finders returning customised types for module specs, but I guess you could get the same effect defining a new "exec_module" API on loaders. >> >> Slight(!) tangent from the original problem, but a worthwhile refactoring issue to tackle, I think :) > > Yeah, even if it proves too big a change for 3.4 and we take some other approach for indirections, I think there's a lot to gain from separating the module specification from the module and from the loader. I've attached a patch that does the bare minimum of what I think we'd want from ModuleSpec. I'll probably flesh out more of my ideas for it later. > > Of course, I don't want anything here to get in the way of the .ref PEP which I think has more concrete value. So if this tangent threatens any chance at getting indirection files for 3.4, I'd rather defer any effort on these extras until 3.5 in favor of a simpler (if less desirable) approach. I suspect ref files will be an easier sell with an elegant way to handle the indirection tracking. I'm not aware of anyone that actually *likes* the current amount of work loaders have to do, it's just that we only figured that out with the benefit of hindsight :) Cheers, Nick. > > -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 4 07:07:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 4 Aug 2013 15:07:47 +1000 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On 2 August 2013 13:34, Eric Snow wrote: > > On Thu, Aug 1, 2013 at 6:56 PM, Nick Coghlan wrote: >> >> >> On 2 Aug 2013 00:44, "Eric Snow" wrote: >> > This is pretty much exactly what I've been thinking about since PyCon. >> > The only difference is that I have a distinct ModuleSpec class and modules >> > would get a new __spec__ attribute. >> >> And we can quit adding ever more magic attributes directly to the module >> namespace. I like it. > > Yeah, that was part of what lead me to the idea. This could be taken to > some pretty great lengths (I've given it a lot of thought), but I'm trying > hard to not do too much at once. I wasn't even planning on pursuing > ModuleSpec until 3.5, much less any of my more drastic ideas. >> >> With that model, things might look vaguely like: >> >> 1. Finders would optionally offer "get_module_spec" (although a better >> name would be nice!) > > How about "find_module"? <.5 wink> Actually, I'm pretty sure this can be > done in a backward-compatible way (in not too much time I've roughed out an > implementation that should work). I would rather not introduce more API to > the import system, but if that's preferable to hijacking (or improving > ) find_module() then I can live with that. However, given the crowd > that takes advantage of the import system APIs, I wouldn't consider the > change disruptive as long as it's backward compatible. > > This would also allow us to deprecate PathEntryFinder.get_loader() which we > wouldn't have needed if we'd had something like ModuleSpec. I finally had a chance to look at your draft implementation. That's a neat attempt at backwards compatibility, but I'm not sure it will work properly - you already had to block out several interesting methods for compatibility reasons, and there's a potential for conflict even with the methods you did keep (since custom loaders may have additional methods beyond those in the specs). YAFM is annoying (Yet Another Method, I'll let you fill in the rest), but I think it's better than trying to be too clever and accidentally breaking things. How about "find_import" as a new method name? And ImportSpec as the class name, rather than ModuleSpec? >> 4. We'd create a "SpecLoader" to offer backwards compatibility in the old >> __loader__ attribute. > > Interesting. I had anticipated loaders still sticking around, still exposed > by module.__loader__ and filling most of their current role, especially with > regard to the optional PEP 302 APIs. I suppose we could deprecate the > __loader__ attribute, and maybe even __package__, in favor of __spec__, but > I don't think there's any rush to do so before Python 4000. Yeah, I think having the spec as something people *don't* customise is a good idea. >> Slight(!) tangent from the original problem, but a worthwhile refactoring >> issue to tackle, I think :) > > Yeah, even if it proves too big a change for 3.4 and we take some other > approach for indirections, I think there's a lot to gain from separating the > module specification from the module and from the loader. I've attached a > patch that does the bare minimum of what I think we'd want from ModuleSpec. > I'll probably flesh out more of my ideas for it later. > > Of course, I don't want anything here to get in the way of the .ref PEP > which I think has more concrete value. So if this tangent threatens any > chance at getting indirection files for 3.4, I'd rather defer any effort on > these extras until 3.5 in favor of a simpler (if less desirable) approach. I just realised there's another added bonus to this approach: __spec__.__name__ will let us record the *real* name of modules executed via -m, even with __name__ set to "__main__". So it could also greatly simplify some aspects of PEP 395 :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu Aug 8 03:08:01 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 7 Aug 2013 19:08:01 -0600 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Sat, Aug 3, 2013 at 11:07 PM, Nick Coghlan wrote: > On 2 August 2013 13:34, Eric Snow wrote: > I finally had a chance to look at your draft implementation. That's a > neat attempt at backwards compatibility, but I'm not sure it will work > properly - you already had to block out several interesting methods > for compatibility reasons, and there's a potential for conflict even > with the methods you did keep (since custom loaders may have > additional methods beyond those in the specs). > Yeah, that was a pretty rough stab at it. I've since done a little more, including implementing __getattr__() and getting a little clever for is_package. And I'm still not sure it will work. isinstance checks will fail (duck-typing FTW) and id() gives a different value for the spec and for the loader. I suppose that's the rub with proxies. So I'm not sure it will work, but it *could* be close enough. We'll see. > YAFM is annoying (Yet Another Method, I'll let you fill in the rest), > but I think it's better than trying to be too clever and accidentally > breaking things. > That's my concern too. > > How about "find_import" as a new method name? And ImportSpec as the > class name, rather than ModuleSpec? > To me "ImportSpec" says "spec for the import system". > >> 4. We'd create a "SpecLoader" to offer backwards compatibility in the > old > >> __loader__ attribute. > > > > Interesting. I had anticipated loaders still sticking around, still > exposed > > by module.__loader__ and filling most of their current role, especially > with > > regard to the optional PEP 302 APIs. I suppose we could deprecate the > > __loader__ attribute, and maybe even __package__, in favor of __spec__, > but > > I don't think there's any rush to do so before Python 4000. > > Yeah, I think having the spec as something people *don't* customise is > a good idea. > I tried it both ways and it's a *lot* simpler if the spec is not designed for modification. I expect the case for modifying a spec would be pretty uncommon. > > >> Slight(!) tangent from the original problem, but a worthwhile > refactoring > >> issue to tackle, I think :) > > > > Yeah, even if it proves too big a change for 3.4 and we take some other > > approach for indirections, I think there's a lot to gain from separating > the > > module specification from the module and from the loader. I've attached > a > > patch that does the bare minimum of what I think we'd want from > ModuleSpec. > > I'll probably flesh out more of my ideas for it later. > > > > Of course, I don't want anything here to get in the way of the .ref PEP > > which I think has more concrete value. So if this tangent threatens any > > chance at getting indirection files for 3.4, I'd rather defer any effort > on > > these extras until 3.5 in favor of a simpler (if less desirable) > approach. > > I just realised there's another added bonus to this approach: > __spec__.__name__ will let us record the *real* name of modules > executed via -m, even with __name__ set to "__main__". So it could > also greatly simplify some aspects of PEP 395 :) > That's a good one. I'll give it a try. The patch I've got is pretty hefty. Should I keep it low key and just post it here, or would it be worth logging a ticket and posting it there for review? Once I'm comfortable with the patch I'll try sticking my .ref patch on top and see how it looks. I'll probably whip up a PEP for ModuleSpec at that point if things are looking good. I'm just worried about getting this done in time for 3.4. On top of this I'm really close on OrderedDict, ordered class definition namespace, .__definition_order__, and locals('__kworder__'), so I'm still kind of nervous about taking on two non-trivial changes to the import system with so little time before beta 1. However, at this point I still think it's doable. :) -eric p.s. I hadn't realized this list was "closed". Should we change that, or take this (both ModuleSpec and .ref) to python-ideas (or off-line)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Aug 8 14:18:16 2013 From: brett at python.org (Brett Cannon) Date: Thu, 8 Aug 2013 08:18:16 -0400 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Wed, Aug 7, 2013 at 9:08 PM, Eric Snow wrote: > > > > On Sat, Aug 3, 2013 at 11:07 PM, Nick Coghlan wrote: > >> On 2 August 2013 13:34, Eric Snow wrote: >> I finally had a chance to look at your draft implementation. That's a >> neat attempt at backwards compatibility, but I'm not sure it will work >> properly - you already had to block out several interesting methods >> for compatibility reasons, and there's a potential for conflict even >> with the methods you did keep (since custom loaders may have >> additional methods beyond those in the specs). >> > > Yeah, that was a pretty rough stab at it. I've since done a little more, > including implementing __getattr__() and getting a little clever for > is_package. And I'm still not sure it will work. isinstance checks will > fail (duck-typing FTW) and id() gives a different value for the spec and > for the loader. I suppose that's the rub with proxies. So I'm not sure it > will work, but it *could* be close enough. We'll see. > > >> YAFM is annoying (Yet Another Method, I'll let you fill in the rest), >> but I think it's better than trying to be too clever and accidentally >> breaking things. >> > > That's my concern too. > > >> >> How about "find_import" as a new method name? And ImportSpec as the >> class name, rather than ModuleSpec? >> > > To me "ImportSpec" says "spec for the import system". > > >> >> 4. We'd create a "SpecLoader" to offer backwards compatibility in the >> old >> >> __loader__ attribute. >> > >> > Interesting. I had anticipated loaders still sticking around, still >> exposed >> > by module.__loader__ and filling most of their current role, especially >> with >> > regard to the optional PEP 302 APIs. I suppose we could deprecate the >> > __loader__ attribute, and maybe even __package__, in favor of __spec__, >> but >> > I don't think there's any rush to do so before Python 4000. >> >> Yeah, I think having the spec as something people *don't* customise is >> a good idea. >> > > I tried it both ways and it's a *lot* simpler if the spec is not designed > for modification. I expect the case for modifying a spec would be pretty > uncommon. > > >> >> >> Slight(!) tangent from the original problem, but a worthwhile >> refactoring >> >> issue to tackle, I think :) >> > >> > Yeah, even if it proves too big a change for 3.4 and we take some other >> > approach for indirections, I think there's a lot to gain from >> separating the >> > module specification from the module and from the loader. I've >> attached a >> > patch that does the bare minimum of what I think we'd want from >> ModuleSpec. >> > I'll probably flesh out more of my ideas for it later. >> > >> > Of course, I don't want anything here to get in the way of the .ref PEP >> > which I think has more concrete value. So if this tangent threatens any >> > chance at getting indirection files for 3.4, I'd rather defer any >> effort on >> > these extras until 3.5 in favor of a simpler (if less desirable) >> approach. >> >> I just realised there's another added bonus to this approach: >> __spec__.__name__ will let us record the *real* name of modules >> executed via -m, even with __name__ set to "__main__". So it could >> also greatly simplify some aspects of PEP 395 :) >> > > That's a good one. I'll give it a try. > > The patch I've got is pretty hefty. Should I keep it low key and just > post it here, or would it be worth logging a ticket and posting it there > for review? > Once it's all written up in a PEP you can post an issue for the code. > Once I'm comfortable with the patch I'll try sticking my .ref patch on > top and see how it looks. I'll probably whip up a PEP for ModuleSpec at > that point if things are looking good. > > I'm just worried about getting this done in time for 3.4. On top of this > I'm really close on OrderedDict, ordered class definition namespace, > .__definition_order__, and locals('__kworder__'), so I'm still kind > of nervous about taking on two non-trivial changes to the import system > with so little time before beta 1. However, at this point I still think > it's doable. :) > I personally view all of this as bonus stuff that is in no way required to make Python function or make some new class of solution available, so I wouldn't stress about getting in for 3.4. > > -eric > > p.s. I hadn't realized this list was "closed". Should we change that, or > take this (both ModuleSpec and .ref) to python-ideas (or off-line)? > Eric or Barry are the admins so they can change the wording. I say just leave it here for now until people are happy with the proposal and then it can be kicked up to python-dev (python-ideas isn't needed in this case since we have this mailing list specifically for import discussions). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 9 08:34:34 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 00:34:34 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System Message-ID: This is an outgrowth of discussions on the .ref PEP, but it's also something I've been thinking about for over a year and starting toying with at the last PyCon. I have a patch that passes all but a couple unit tests and should pass though when I get a minute to take another pass at it. I'll probably end up adding a bunch more unit tests before I'm done as well. However, the functionality is mostly there. BTW, I gotta say, Brett, I have a renewed appreciation for the long and hard effort you put into importlib. There are just so many odd corner cases that I never would have looked for if not for that library. And those unit tests do a great job of covering all of that. Thanks! -eric ------------------------------------------------------------------------------- PEP: 4XX Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow BDFL-Delegate: ??? Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013 Resolution: Abstract ======== This PEP proposes to add a new class to ``importlib.machinery`` called ``ModuleSpec``. It will contain all the import-related information about a module without needing to load the module first. Finders will now return a module's spec rather than a loader. The import system will use the spec to load the module. Motivation ========== The import system has evolved over the lifetime of Python. In late 2002 PEP 302 introduced standardized import hooks via ``finders`` and ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibilty also presents a challenge. As more developers come to understand and customize the import system, any weaknesses in the finder and loader APIs will be more impactful. So the sooner we can address any such weaknesses the import system, the better...and there are a couple we can take care of with this proposal. Firstly, any time the import system needs to save information about a module we end up with more attributes on module objects that are generally only meaningful to the import system and occoasionally to some people. It would be nice to have a per-module namespace to put future import-related information. Secondly, there's an API void between finders and loaders that causes undue complexity when encountered. Finders are strictly responsible for providing the loader which the import system will use to load the module. The loader is then responsible for doing some checks, creating the module object, setting import-related attributes, "installing" the module to ``sys.modules``, and loading the module, along with some cleanup. This all takes place during the import system's call to ``Loader.load_module()``. Loaders also provide some APIs for accessing data associated with a module. Loaders are not required to provide any of the functionality of ``load_module()`` through other methods. Thus, though the import- related information about a module is likely available without loading the module, it is not otherwise exposed. Furthermore, the requirements assocated with ``load_module()`` are common to all loaders and mostly are implemented in exactly the same way. This means every loader has to duplicate the same boilerplate code. ``importlib.util`` provides some tools that help with this, but it would be more helpful if the import system simply took charge of these responsibilities. The trouble is that this would limit the degree of customization that ``load_module()`` facilitates. This is a gap between finders and loaders which this proposal aims to fill. Finally, when the import system calls a finder's ``find_module()``, the finder makes use of a variety of information about the module that is useful outside the context of the method. Currently the options are limited for persisting that per-module information past the method call, since it only returns the loader. Either store it in a module-to-info mapping somewhere like on the finder itself, or store it on the loader. Unfortunately, loaders are not required to be module-specific. On top of that, some of the useful information finders could provide is common to all finders, so ideally the import system could take care of that. This is the same gap as before between finders and loaders. As an example of complexity attributable to this flaw, the implementation of namespace packages in Python 3.3 (see PEP 420) added ``FileFinder.find_loader()`` because there was no good way for ``find_module()`` to provide the namespace path. The answer to this gap is a ``ModuleSpec`` object that contains the per-module information and takes care of the boilerplate functionality of loading the module. (The idea grew feet during discussions related to another PEP.[1]) Specification ============= ModuleSpec ---------- A new class which defines the import-related values to use when loading the module. It closely corresponds to the import-related attributes of module objects. ``ModuleSpec`` objects may also be used by finders and loaders and other import-related APIs to hold extra import-related information about the module. This greatly reduces the need to add any new import-related attributes to module objects. Attributes: * ``name`` - the module's name (compare to ``__name__``). * ``loader`` - the loader to use during loading and for module data (compare to ``__loader__``). * ``package`` - the name of the module's parent (compare to ``__package__``). * ``is_package`` - whether or not the module is a package. * ``origin`` - the location from which the module originates. * ``filename`` - like origin, but limited to a path-based location (compare to ``__file__``). * ``cached`` - the location where the compiled module should be stored (compare to ``__cached__``). * ``path`` - the list of path entries in which to search for submodules or ``None``. (compare to ``__path__``). It should be in sync with ``is_package``. Those are also the parameters to ``ModuleSpec.__init__()``, in that order. The last three are optional. When passed the values are taken as-is. The ``from_loader()`` method offers calculated values. Methods: * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the arguments. The parameters are the same as with ``__init__``, except ``package`` is excluded and only ``name`` and ``loader`` are required. * ``module_repr()`` - returns a repr for the module. * ``init_module_attrs(module)`` - sets the module's import-related attributes. * ``load(module=None, *, is_reload=False)`` - calls the loader's ``exec_module()``, falling back to ``load_module()`` if necessary. This method performs the former responsibilities of loaders for managing modules before actually loading and for cleaning up. The reload case is facilitated by the ``module`` and ``is_reload`` parameters. Values Derived by from_loader() ------------------------------- As implied above, ``from_loader()`` makes a best effort at calculating any of the values that are not passed in. It duplicates the behavior that was formerly provided the several ``importlib.util`` functions as well as the ``init_module_attrs()`` method of several of ``importlib``'s loaders. Just to be clear, here is a more detailed description of those calculations: ``is_package`` is derived from ``path``, if passed. Otherwise the loader's ``is_package()`` is tried. Finally, it defaults to False. ``filename`` is pulled from the loader's ``get_filename()``, if possible. ``path`` is set to an empty list if ``is_package`` is true, and the directory from ``filename`` is appended to it, if available. ``cached`` is derived from ``filename`` if it's available. ``origin`` is set to ``filename``. ``package`` is set to ``name`` if the module is a package and to ``name.rpartition('.')[0]`` otherwise. Consequently, a top-level module will have ``package`` set to the empty string. Backward Compatibility ---------------------- Since finder ``find_module()`` methods would now return a module spec instead of loader, specs must act like the loader that would have been returned instead. This is relatively simple to solve since the loader is available as an attribute of the spec. However, ``ModuleSpec.is_package`` (an attribute) conflicts with ``InspectLoader.is_package()`` (a method). Working around this requires a more complicated solution but is not a large obstacle. Unfortunately, the ability to proxy does not extend to ``id()`` comparisons and ``isinstance()`` tests. In the case of the return value of ``find_module()``, we accept that break in backward compatibility. Subclassing ----------- .. XXX Allowed but discouraged? Module Objects -------------- Module objects will now have a ``__spec__`` attribute to which the module's spec will be bound. None of the other import-related module attributes will be changed or deprecated, though some of them could be. Any such deprecation can wait until Python 4. ``ModuleSpec`` objects will not be kept in sync with the corresponding module object's import-related attributes. They may differ, though in practice they will be the same. Finders ------- Finders will now return ModuleSpec objects when ``find_module()`` is called rather than loaders. For backward compatility, ``Modulespec`` objects proxy the attributes of their ``loader`` attribute. Adding another similar method to avoid backward-compatibility issues is undersireable if avoidable. The import APIs have suffered enough. The approach taken by this PEP should be sufficient. The change to ``find_module()`` applies to both ``MetaPathFinder`` and ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be deprecated and, for backward compatibility, implicitly special-cased if the method exists on a finder. Loaders ------- Loaders will have a new method, ``exec_module(module)``. Its only job is to "exec" the module and consequently populate the module's namespace. It is not responsible for creating or preparing the module object, nor for any cleanup afterward. It has no return value. The ``load_module()`` of loaders will still work and be an active part of the loader API. It is still useful for cases where the default module creation/prepartion/cleanup is not appropriate for the loader. A loader must have ``exec_module()`` or ``load_module()`` defined. If both exist on the loader, ``exec_module()`` is used and ``load_module()`` is ignored. PEP 420 introduced the optional ``module_repr()`` loader method to limit the amount of special-casing in the module type's ``__repr__()``. Since this method is part of ``ModuleSpec``, it will be deprecated on loaders. However, if it exists on a loader it will be used exclusively. The loader ``init_module_attr()`` method, added for Python 3.4 will be eliminated in favor of the same method on ``ModuleSpec``. However, ``InspectLoader.is_package()`` will not be deprecated even though the same information is found on ``ModuleSpec``. ``ModuleSpec`` can use it to populate its own ``is_package`` if that information is not otherwise available. Still, it will be made optional. In addition to executing a module during loading, loaders will still be directly responsible for providing APIs concerning module-related data. Other Changes ------------- * The various finders and loaders provided by ``importlib`` will be updated to comply with this proposal. * The spec for the ``__main__`` module will reflect how the interpreter was started. For instance, with ``-m`` the spec's name will be that of the run module, while ``__main__.__name__`` will still be "__main__". * We add ``importlib.find_module()`` to mirror ``importlib.find_loader()`` (which becomes deprecated). * Deprecations in ``importlib.util``: ``set_package()``, ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` (introduced in 3.4) can be removed. * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of the per-module import lock, whereas ``Loader.load_module()`` did not. Reference Implementation ------------------------ A reference implementation is available at . References ========== [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 9 08:38:18 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 00:38:18 -0600 Subject: [Import-SIG] PEP proposal: Per-Module Import Path In-Reply-To: References: Message-ID: On Thu, Aug 8, 2013 at 6:18 AM, Brett Cannon wrote: > On Wed, Aug 7, 2013 at 9:08 PM, Eric Snow wrote: >> >> The patch I've got is pretty hefty. Should I keep it low key and just >> post it here, or would it be worth logging a ticket and posting it there >> for review? >> > > Once it's all written up in a PEP you can post an issue for the code. > PEP sent to list. I want to pass a couple lingering unit tests before I post the patch. > > >> Once I'm comfortable with the patch I'll try sticking my .ref patch on >> top and see how it looks. I'll probably whip up a PEP for ModuleSpec at >> that point if things are looking good. >> >> I'm just worried about getting this done in time for 3.4. On top of this >> I'm really close on OrderedDict, ordered class definition namespace, >> .__definition_order__, and locals('__kworder__'), so I'm still kind >> of nervous about taking on two non-trivial changes to the import system >> with so little time before beta 1. However, at this point I still think >> it's doable. :) >> > > I personally view all of this as bonus stuff that is in no way required to > make Python function or make some new class of solution available, so I > wouldn't stress about getting in for 3.4. > > >> >> -eric >> >> p.s. I hadn't realized this list was "closed". Should we change that, or >> take this (both ModuleSpec and .ref) to python-ideas (or off-line)? >> > > Eric or Barry are the admins so they can change the wording. I say just > leave it here for now until people are happy with the proposal and then it > can be kicked up to python-dev (python-ideas isn't needed in this case > since we have this mailing list specifically for import discussions). > Fine with me. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Aug 9 10:28:03 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 10:28:03 +0200 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System References: Message-ID: <20130809102803.5615941d@pitrou.net> Hi, Le Fri, 9 Aug 2013 00:34:34 -0600, Eric Snow a ?crit : > Abstract > ======== > > This PEP proposes to add a new class to ``importlib.machinery`` called > ``ModuleSpec``. It will contain all the import-related information > about a module without needing to load the module first. Finders will > now return a module's spec rather than a loader. The import system > will use the spec to load the module. Looks good on the principle. > Attributes: > > * ``name`` - the module's name (compare to ``__name__``). > * ``loader`` - the loader to use during loading and for module data > (compare to ``__loader__``). Should it be the loader or just a factory to build it? I'm wondering if in some cases creating a loader is costly. > * ``package`` - the name of the module's parent (compare to > ``__package__``). Is it None if there is no parent? > * ``is_package`` - whether or not the module is a package. > * ``origin`` - the location from which the module originates. > * ``filename`` - like origin, but limited to a path-based location > (compare to ``__file__``). Can you explain the difference between origin and filename (or, better, give an example)? > * ``load(module=None, *, is_reload=False)`` - calls the loader's > ``exec_module()``, falling back to ``load_module()`` if necessary. > This method performs the former responsibilities of loaders for > managing modules before actually loading and for cleaning up. The > reload case is facilitated by the ``module`` and ``is_reload`` > parameters. So how about separate load() and reload() methods? > However, ``ModuleSpec.is_package`` (an attribute) conflicts with > ``InspectLoader.is_package()`` (a method). Working around this > requires a more complicated solution but is not a large obstacle. Or how about keeping the method API? > Module Objects > -------------- > > Module objects will now have a ``__spec__`` attribute to which the > module's spec will be bound. Nice! > Loaders will have a new method, ``exec_module(module)``. Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. Does it work with extension modules as well? Generally, extension modules are populated when created (i.e. the two steps aren't separate at the C API level, IIRC). Regards Antoine. From brett at python.org Fri Aug 9 16:43:10 2013 From: brett at python.org (Brett Cannon) Date: Fri, 9 Aug 2013 10:43:10 -0400 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: <20130809102803.5615941d@pitrou.net> References: <20130809102803.5615941d@pitrou.net> Message-ID: On Fri, Aug 9, 2013 at 4:28 AM, Antoine Pitrou wrote: > > Hi, > > Le Fri, 9 Aug 2013 00:34:34 -0600, > Eric Snow a ?crit : > > Abstract > > ======== > > > > This PEP proposes to add a new class to ``importlib.machinery`` called > > ``ModuleSpec``. It will contain all the import-related information > > about a module without needing to load the module first. Finders will > > now return a module's spec rather than a loader. The import system > > will use the spec to load the module. > > Looks good on the principle. > > > Attributes: > > > > * ``name`` - the module's name (compare to ``__name__``). > > * ``loader`` - the loader to use during loading and for module data > > (compare to ``__loader__``). > > Should it be the loader or just a factory to build it? > I'm wondering if in some cases creating a loader is costly. > Theoretically it could be costly, but up to this point I have not seen a single loader that cost a lot to create. Every loader I have ever written just stores details that the finder had to calculate for it's work and potentially stores something, e.g. an open zipfile that the finder used to see if a module was there. > > > * ``package`` - the name of the module's parent (compare to > > ``__package__``). > > Is it None if there is no parent? > Top-level modules have the value of '' for __package__. None is used to represent an unknown value. -Brett > > > * ``is_package`` - whether or not the module is a package. > > * ``origin`` - the location from which the module originates. > > * ``filename`` - like origin, but limited to a path-based location > > (compare to ``__file__``). > > Can you explain the difference between origin and filename (or, better, > give an example)? > > > * ``load(module=None, *, is_reload=False)`` - calls the loader's > > ``exec_module()``, falling back to ``load_module()`` if necessary. > > This method performs the former responsibilities of loaders for > > managing modules before actually loading and for cleaning up. The > > reload case is facilitated by the ``module`` and ``is_reload`` > > parameters. > > So how about separate load() and reload() methods? > > > However, ``ModuleSpec.is_package`` (an attribute) conflicts with > > ``InspectLoader.is_package()`` (a method). Working around this > > requires a more complicated solution but is not a large obstacle. > > Or how about keeping the method API? > > > Module Objects > > -------------- > > > > Module objects will now have a ``__spec__`` attribute to which the > > module's spec will be bound. > > Nice! > > > Loaders will have a new method, ``exec_module(module)``. Its only job > > is to "exec" the module and consequently populate the module's > > namespace. It is not responsible for creating or preparing the module > > object, nor for any cleanup afterward. It has no return value. > > Does it work with extension modules as well? Generally, extension > modules are populated when created (i.e. the two steps aren't separate > at the C API level, IIRC). > > Regards > > Antoine. > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 9 18:45:22 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 10:45:22 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: <20130809102803.5615941d@pitrou.net> References: <20130809102803.5615941d@pitrou.net> Message-ID: On Fri, Aug 9, 2013 at 2:28 AM, Antoine Pitrou wrote: > Le Fri, 9 Aug 2013 00:34:34 -0600, > Eric Snow a ?crit : > > Attributes: > > > > * ``name`` - the module's name (compare to ``__name__``). > > * ``loader`` - the loader to use during loading and for module data > > (compare to ``__loader__``). > > Should it be the loader or just a factory to build it? > I'm wondering if in some cases creating a loader is costly. > The finder is currently responsible for creating the loader and this PEP does not propose changing that. So any such loader already has to deal with this. I suppose some loader could be expensive to create, but none of the existing loaders in the stdlib are that costly. If some future loader runs into this problem they can pretty easily write the loader in such a way that it defers the costly operations. I'll make a note in the PEP about this. > > * ``package`` - the name of the module's parent (compare to > > ``__package__``). > > Is it None if there is no parent? > As Brett noted, it is ''. This is the same as the __package__ attribute of modules. The goal is to keep the same behavior, as much as possible, for all the feature that are moved into ModuleSpec. I'll make this objective more clear in the PEP. > > > * ``is_package`` - whether or not the module is a package. > > * ``origin`` - the location from which the module originates. > > * ``filename`` - like origin, but limited to a path-based location > > (compare to ``__file__``). > > Can you explain the difference between origin and filename (or, better, > give an example)? > Yeah, that wasn't too clear, was it? filename maps directly to the module's __file__ attribute, which is not set for all modules. For instance, built-in modules do not set it nor do namespace packages. In those cases it is still nice to be able to indicate where the module came from. For built-in modules origin will be set to 'built-in' and for namespace packages 'namespace'. For any module with a filename, origin is set to the filename. Having both origin and filename is meant to provide for different usage. filename is used to populate a module's __file__ attribute. If set, it indicates a path-based module (along with cached and path). In contrast, origin has a broader meaning and is used by the module_repr() method. I suppose there could be a flag to indicate the module is path-based, but I went with a separate spec attribute. Likewise, I toyed with the idea of a path-based subclass, perhaps PathModuleSpec, but wanted to stick with a one-size-fits-all spec class since it is meant to be used almost exclusively for state rather than functionality. In some ways it's like types.SimpleNamespace, but with a couple of import-related methods and some dedicated state. I'll make sure the PEP reflects this. > > * ``load(module=None, *, is_reload=False)`` - calls the loader's > > ``exec_module()``, falling back to ``load_module()`` if necessary. > > This method performs the former responsibilities of loaders for > > managing modules before actually loading and for cleaning up. The > > reload case is facilitated by the ``module`` and ``is_reload`` > > parameters. > > So how about separate load() and reload() methods? > I thought about that too, but found it simpler to keep them together. Also, reload is a pretty specialized activity and I plan on leaving some of the boilerplate of it to importlib.reload(). However, I'm not convinced either way actually. I'll think about that some more and update the PEP regardless. Do you have a case to make for making them separate? > > > However, ``ModuleSpec.is_package`` (an attribute) conflicts with > > ``InspectLoader.is_package()`` (a method). Working around this > > requires a more complicated solution but is not a large obstacle. > > Or how about keeping the method API? > Because it is a static piece of data. At the point that we can remove the backward compatibility support, we would be stuck with a method when it should be just a normal attribute. > > > Module Objects > > -------------- > > > > Module objects will now have a ``__spec__`` attribute to which the > > module's spec will be bound. > > Nice! > Ironic that this PEP adds yet another import-related attribute to modules. :) Hopefully it's the last one. > > > Loaders will have a new method, ``exec_module(module)``. Its only job > > is to "exec" the module and consequently populate the module's > > namespace. It is not responsible for creating or preparing the module > > object, nor for any cleanup afterward. It has no return value. > > Does it work with extension modules as well? Generally, extension > modules are populated when created (i.e. the two steps aren't separate > at the C API level, IIRC). > Yeah, it works great. We simply don't implement exec_module() on ExtensionFileLoader and things just stay the same. There is room to add an exec_module() and update the C-API for extension modules to support it, but I'll leaving that out of the PEP. However, I will mention that in the PEP because your question is quite relevant and not well answered there. -eric > Regards > > Antoine. > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 9 20:03:32 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 12:03:32 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 8:40 AM, Brett Cannon wrote: > On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote: > >> Finally, when the import system calls a finder's ``find_module()``, the >> > finder makes use of a variety of information about the module that is >> useful outside the context of the method. Currently the options are >> limited for persisting that per-module information past the method call, >> since it only returns the loader. Either store it in a module-to-info >> mapping somewhere like on the finder itself, or store it on the loader. >> > > The two previous sentences are hard to read; I think you were after > something like, > "Popular options for this limitation are to store the information is in a > module-to-info > mapping somewhere on the finder itself, or store it on the loader. > Sounds good. > > >> (The idea grew feet during discussions related to another PEP.[1]) >> > > "(This PEP grew out of discussions related to another PEP [1])" > Yeah, this was one of the last things I added to the PEP and my brain was starting to get a little fuzzy. :) > * ``is_package`` - whether or not the module is a package. >> > > I think is_package() is redundant in the face of 'name'/'package' or > 'path' as you can introspect the same information. I honestly have always > found it a weakness of InspectLoader.is_package() that it didn't return the > value for __path__. > I see what you mean, but I also think it's nice to be able to explicitly see if a spec is for a package without having to know about underlying rules. However, I'll just make it a property instead of something set on the spec (and remove it from __init__). > > >> * ``origin`` - the location from which the module originates. >> > > Don't quite follow what this is meant to represent? Like the path to the > zipfile if loaded that way, otherwise it's the file path? > Yeah, Antoine had the same question. I'll make sure the PEP is clearer. Basically filename maps to the module's __file__ and origin is used for the module's repr if filename isn't set. > > >> * ``filename`` - like origin, but limited to a path-based location >> (compare to ``__file__``). >> * ``cached`` - the location where the compiled module should be stored >> (compare to ``__cached__``). >> * ``path`` - the list of path entries in which to search for submodules >> or ``None``. (compare to ``__path__``). It should be in sync with >> ``is_package``. >> > > Why is 'path' the only attribute with a default value? Should probably say > everything has a default value of None if not set/known. > Good point. > > >> >> Those are also the parameters to ``ModuleSpec.__init__()``, in that >> order. >> > > I would consider arguing all arguments should be keyword-only past 'name' > since there is no way most people will remember that order correctly. > Makes sense, though I'll make everything but name and loader keyword-only. > * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the >> arguments. The parameters are the same as with ``__init__``, except >> ``package`` is excluded and only ``name`` and ``loader`` are required. >> > > Why the switch in requirements compared to __init__()? > Because package is always calculated and only name and loader are necessary to calculate the remaining attributes. Perhaps from_loader() is the wrong name (I'm open to alternatives). Perhaps __init__() should take over some of the calculating. My intention is to provide one API for what-you-pass-in-is-what-you-get (__init__) and another for calculating attributes. Of course, one could simply modify the spec after creating it, but I like idea of explicitly opting in to calculated values. I'll add this point to the PEP. Also I'll probably also drop package as a parameter of __init__ and make the attribute a property. I've also toyed with the idea of making all the attributes properties (aka read-only) since changing a module's spec later on could lead to headache, but I'm not convinced that is a easy problem to cause. It's better to not get in the way of those who have needs I haven't anticipated (consenting adults, etc.). What do you think? > > >> * ``module_repr()`` - returns a repr for the module. >> * ``init_module_attrs(module)`` - sets the module's import-related >> attributes. >> > > Specify what those attributes are and how they are set. > Will do. > > >> * ``load(module=None, *, is_reload=False)`` - calls the loader's >> ``exec_module()``, falling back to ``load_module()`` if necessary. >> This method performs the former responsibilities of loaders for >> managing modules before actually loading and for cleaning up. The >> reload case is facilitated by the ``module`` and ``is_reload`` >> parameters. >> > > If a module is provided and there is already a matching key in > sys.modules, what happens? > What if is_reload is True but there is no module provided or in > sys.modules; KeyError, ValueError, ImportError? Do you follow having None > in sys.modules and raise ImportError, or do you overwrite (same question if > a module is explicitly provided)? > That's a good point. I thought I had addressed this in the PEP, but apparently not. For Loader.load_module(), as you know, the existence of the key in sys.modules indicates a reload should happen. The is_reload parameter is meant to provide an explicit indicator. The module you pass in is simply the one to use. If a module is not passed in and is_reload is true, the module in sys.modules will be used. If that module is None or not there, ImportError would be raised. If a module is passed in and is_reload is false, I was planning on just ignoring that module. However raising ValueError in that case would be more useful, indicating that the method was called incorrectly. Having just the module parameter and letting it indicate a reload is doable, but that would mean losing the option of having load() look up the module (and it's less explicit). Another option is to have a separate reload() method. Antoine mentioned it and I'd considered it early on. I'm considering it again since it makes the API less complicated. Do you have a preference between the current proposal (load() does it all) and a separate reload() method? ``is_package`` is derived from ``path``, if passed. Otherwise the >> loader's ``is_package()`` is tried. Finally, it defaults to False. >> > > It can also be calculated based on whether ``name`` == ``package``: ``True > if path is not None else name == package``. > Good point, though at this point I don't think package will be something you set. Always need to watch out for [] for path as that is valid and signals the > module is a package. > Yeah, I've got that covered in from_loader(). This is where defining exactly what details need to be passed in and which > ones are optional are going to be critical in determining what represents > ambiguity/unknown details vs. what is flat-out known to be true/false. > Agreed. I'll be sure to spell it out. > ``cached`` is derived from ``filename`` if it's available. >> > > Derived how? > cache_from_source() > methods would now return a module spec >> instead of loader, specs must act like the loader that would have been >> returned instead. This is relatively simple to solve since the loader >> is available as an attribute of the spec. >> > > Are you going to define a __getattr__ to delegate to the loader? Or are > you going to specifically define equivalent methods, e.g. get_filename() is > obviously solvable by getting the attribute from the spec (as long as > filename is a required value)? > __getattr__(). I don't want to guess what methods a loader might have. And if someone wants to call get_filename() on what they think is the loader, I think it's better to just call the loader's get_filename(). I'd left this stuff out as an implementation detail. Do you think it should be in the PEP? I could simply elaborate on "specs must act like the loader". > > >> >> However, ``ModuleSpec.is_package`` (an attribute) conflicts with >> ``InspectLoader.is_package()`` (a method). Working around this requires >> a more complicated solution but is not a large obstacle. >> >> Unfortunately, the ability to proxy does not extend to ``id()`` >> comparisons and ``isinstance()`` tests. In the case of the return value >> of ``find_module()``, we accept that break in backward compatibility. >> > > Mention that ModuleSpec can be added to the proper ABCs in importlib.abc > to help alleviate this issue. > Good point. > > >> >> Subclassing >> ----------- >> >> .. XXX Allowed but discouraged? >> > > Why should it matter if they are subclassed? > My goal was for ModuleSpec to be the container for module definition state with some common attributes as a baseline and a minimal number of methods for the import system to use. Loaders would be where you would do extra stuff or customize functionality, which is basically what happens now. It seemed correct before but now it's feeling like a very artificial and unnecessary objective. Finders >> ------- >> >> Finders will now return ModuleSpec objects when ``find_module()`` is >> called rather than loaders. For backward compatility, ``Modulespec`` >> objects proxy the attributes of their ``loader`` attribute. >> >> Adding another similar method to avoid backward-compatibility issues >> is undersireable if avoidable. The import APIs have suffered enough. >> > > in lieu of the fact that find_loader() was just introduced in Python 3.3. > Are you suggesting additional wording or making a comment? > >> Loaders >> ------- >> >> Loaders will have a new method, ``exec_module(module)``. Its only job >> is to "exec" the module and consequently populate the module's >> namespace. It is not responsible for creating or preparing the module >> object, nor for any cleanup afterward. It has no return value. >> >> The ``load_module()`` of loaders will still work and be an active part >> of the loader API. It is still useful for cases where the default >> module creation/prepartion/cleanup is not appropriate for the loader. >> > > But will it still be required? Obviously importlib.abc.Loader can grow a > default load_module() defined around exec_module(), but it should be clear > if we expect the method to always be manually defined or if it will > eventually go away. > load_module() will no longer be required. However, it still serves a real purpose: the loader may still need to control more of the loading process. By implementing load_module() but not exec_module(), a loader gets that. I'm make sure that's clear. > > >> >> A loader must have ``exec_module()`` or ``load_module()`` defined. If >> both exist on the loader, ``exec_module()`` is used and >> ``load_module()`` is ignored. >> > > Ignored by whom? Should specify that the import system is the one doing > the ignoring. > Got it. > * Deprecations in ``importlib.util``: ``set_package()``, >> > ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` >> (introduced in 3.4) can be removed. >> > > "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless > and will outlive 3.4 so specifying it never went public is important. > Good catch. You should be a PEP editor. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Aug 9 20:15:32 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 12:15:32 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: Would it be worth deprecating the current signature and attributes of FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses self.path, but otherwise the only use for the attributes is already covered by the info in the spec. Also, should we have timelines for the deprecations in the PEP. I'm inclined to not worry about it, but it *would* be nice to remove at least some of the backward compatibility hackery that this PEP will introduce. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Aug 9 20:23:39 2013 From: brett at python.org (Brett Cannon) Date: Fri, 9 Aug 2013 14:23:39 -0400 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 2:15 PM, Eric Snow wrote: > Would it be worth deprecating the current signature and attributes of > FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses > self.path, but otherwise the only use for the attributes is already covered > by the info in the spec. > Probably, or at least provide a Spec-only signature of the __init__(). > > Also, should we have timelines for the deprecations in the PEP. I'm > inclined to not worry about it, but it *would* be nice to remove at least > some of the backward compatibility hackery that this PEP will introduce. > Since the backwards-compatibility hacks don't sound like they will be ridiculously complex or getting in the way I say just put in proper PendingDeprecationWarnings and assume they will be there until Python 4 (no later than 8 years away! =). -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Aug 9 21:22:49 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Aug 2013 21:22:49 +0200 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: <20130809102803.5615941d@pitrou.net> Message-ID: <20130809212249.04db6a5c@fsol> On Fri, 9 Aug 2013 10:45:22 -0600 Eric Snow wrote: > > So how about separate load() and reload() methods? > > > > I thought about that too, but found it simpler to keep them together. > Also, reload is a pretty specialized activity and I plan on leaving some > of the boilerplate of it to importlib.reload(). However, I'm not convinced > either way actually. I'll think about that some more and update the PEP > regardless. Do you have a case to make for making them separate? Well, is there another way to use load() than: - load(): load a new module - load(existing_module, is_reload=True): reload an existing module I mean, does it make sense to call e.g. - load(some_existing_module, is_reload=False) - load(is_reload=True) ? Regards Antoine. From ericsnowcurrently at gmail.com Sat Aug 10 00:28:06 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 16:28:06 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 12:20 PM, Brett Cannon wrote: > On Fri, Aug 9, 2013 at 2:03 PM, Eric Snow wrote: > > Having just the module parameter and letting it indicate a reload is >> doable, but that would mean losing the option of having load() look up the >> module (and it's less explicit). Another option is to have a separate >> reload() method. Antoine mentioned it and I'd considered it early on. I'm >> considering it again since it makes the API less complicated. Do you have >> a preference between the current proposal (load() does it all) and a >> separate reload() method? >> > > Nope, no preference. > Okay. I'll probably try it out a separate reload() and see how things look. > > >> >> ``is_package`` is derived from ``path``, if passed. Otherwise the >>>> loader's ``is_package()`` is tried. Finally, it defaults to False. >>>> >>> >>> It can also be calculated based on whether ``name`` == ``package``: >>> ``True if path is not None else name == package``. >>> >> >> Good point, though at this point I don't think package will be something >> you set. >> > > So you would set 'name' and 'path' to decide if something is a package and > use that to calculate 'package'? > That and the loader's is_package(), if available. > cache_from_source() >> > > I figured, but I know too much about this stuff. =) I would spell it out > in the PEP. > Done. > __getattr__(). I don't want to guess what methods a loader might have. >> And if someone wants to call get_filename() on what they think is the >> loader, I think it's better to just call the loader's get_filename(). I'd >> left this stuff out as an implementation detail. Do you think it should be >> in the PEP? I could simply elaborate on "specs must act like the loader". >> > > I would elaborate that it's going to be __getattr__() since it influences > the level of backwards-compatibility. > Done. > My goal was for ModuleSpec to be the container for module definition state >> with some common attributes as a baseline and a minimal number of methods >> for the import system to use. Loaders would be where you would do extra >> stuff or customize functionality, which is basically what happens now. >> >> It seemed correct before but now it's feeling like a very artificial and >> unnecessary objective. >> > > I totally get where you are coming from and if we were working in a > language that pushed for read-only attributes I would agree, but we aren't > so I wouldn't. =) It just becomes more hassle than it's worth to enforce. > Agreed. > in lieu of the fact that find_loader() was just introduced in Python 3.3. >>> >> >> Are you suggesting additional wording or making a comment? >> > > Both? =) > Okay. I clarified that. I'll probably be posting an updated PEP shortly. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Aug 10 00:36:49 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 16:36:49 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 12:23 PM, Brett Cannon wrote: > > > > On Fri, Aug 9, 2013 at 2:15 PM, Eric Snow wrote: > >> Would it be worth deprecating the current signature and attributes of >> FileLoader, NamespaceLoader, etc. FileLoader.get_filename() uses >> self.path, but otherwise the only use for the attributes is already covered >> by the info in the spec. >> > > Probably, or at least provide a Spec-only signature of the __init__(). > > >> >> Also, should we have timelines for the deprecations in the PEP. I'm >> inclined to not worry about it, but it *would* be nice to remove at least >> some of the backward compatibility hackery that this PEP will introduce. >> > > Since the backwards-compatibility hacks don't sound like they will be > ridiculously complex or getting in the way I say just put in proper > PendingDeprecationWarnings and assume they will be there until Python 4 (no > later than 8 years away! =). > Sounds good. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Aug 10 00:44:55 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 16:44:55 -0600 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: <20130809212249.04db6a5c@fsol> References: <20130809102803.5615941d@pitrou.net> <20130809212249.04db6a5c@fsol> Message-ID: On Fri, Aug 9, 2013 at 1:22 PM, Antoine Pitrou wrote: > Well, is there another way to use load() than: > - load(): load a new module > - load(existing_module, is_reload=True): reload an existing module > > I mean, does it make sense to call e.g. > - load(some_existing_module, is_reload=False) > This would be a ValueError. The module argument is meant just for reload. I'm not sure it makes sense otherwise. Perhaps so you could prepare your own new module prior to calling load()? I'd like to leave that off the table for this PEP. > - load(is_reload=True) > This was always okay in my mind, but I realized it did not make it to the PEP until Brett had some similar questions. :) The updated PEP covers this. Like I told Brett, I'm going to see how a separate reload() looks and go from there. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Aug 10 01:19:21 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 17:19:21 -0600 Subject: [Import-SIG] 40k limit on this list Message-ID: Apparently I blew past the size limit for posting to this list. FYI, I posted an updated PEP for ModuleSpec and it should be showing up at some point. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Aug 10 06:58:01 2013 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 10 Aug 2013 06:58:01 +0200 Subject: [Import-SIG] 40k limit on this list In-Reply-To: References: Message-ID: I'm traveling and without access to a real computer. I'll release your message in the next 48 hours, if no one beats me to it. -- Eric. On Aug 10, 2013, at 1:19 AM, Eric Snow wrote: > Apparently I blew past the size limit for posting to this list. FYI, I posted an updated PEP for ModuleSpec and it should be showing up at some point. :) > > -eric > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig From ncoghlan at gmail.com Sat Aug 10 12:50:25 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 10 Aug 2013 20:50:25 +1000 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: <20130809102803.5615941d@pitrou.net> <20130809212249.04db6a5c@fsol> Message-ID: On 10 August 2013 08:44, Eric Snow wrote: > On Fri, Aug 9, 2013 at 1:22 PM, Antoine Pitrou wrote: >> >> Well, is there another way to use load() than: >> - load(): load a new module >> - load(existing_module, is_reload=True): reload an existing module >> >> I mean, does it make sense to call e.g. >> - load(some_existing_module, is_reload=False) > > > This would be a ValueError. The module argument is meant just for reload. > I'm not sure it makes sense otherwise. Perhaps so you could prepare your > own new module prior to calling load()? I'd like to leave that off the > table for this PEP. The advantage of offering that API over telling people to call spec.load.exec_module(m) directly is that it gives us more control over the loading process (by updating ModuleSpec.load), avoiding the current problem we have where providing new load time behaviour is difficult because we don't control the loader implementations. >> >> - load(is_reload=True) > > > This was always okay in my mind, but I realized it did not make it to the > PEP until Brett had some similar questions. :) The updated PEP covers this. > Like I told Brett, I'm going to see how a separate reload() looks and go > from there. A separate reload that works something like this sounds good to me: def reload(self, module=None): if module is None: module = sys.modules[self.name] self.load(module) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Aug 10 13:02:44 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 10 Aug 2013 21:02:44 +1000 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: This generally looks good to me. Something I'm wondering: Q1. Can we experiment with this as a custom metapath importer? A1. Not really, because we want to use it to avoid some of the other importlib additions made in 3.4. However, a backport to 3.3 as a custom metapath hook may still be interesting. Q2. Given this idea as a foundation, could we experiment with ref file support as a custom importer? A2. Quite possibly, which may make that a good thing to defer to 3.5 (for stdlib inclusion, anyway). I'll wait until the updated version gets through before commenting further :) Cheers, Nick. From ericsnowcurrently at gmail.com Sat Aug 10 19:57:18 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 10 Aug 2013 11:57:18 -0600 Subject: [Import-SIG] 40k limit on this list In-Reply-To: References: Message-ID: Thanks, Eric. -eric On Fri, Aug 9, 2013 at 10:58 PM, Eric V. Smith wrote: > I'm traveling and without access to a real computer. I'll release your > message in the next 48 hours, if no one beats me to it. > > -- > Eric. > > On Aug 10, 2013, at 1:19 AM, Eric Snow > wrote: > > > Apparently I blew past the size limit for posting to this list. FYI, I > posted an updated PEP for ModuleSpec and it should be showing up at some > point. :) > > > > -eric > > _______________________________________________ > > Import-SIG mailing list > > Import-SIG at python.org > > http://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Aug 9 16:40:10 2013 From: brett at python.org (Brett Cannon) Date: Fri, 9 Aug 2013 10:40:10 -0400 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: I like the idea and I think it can be more-or-less safe. Just need more specification/clarification on things. On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote: > This is an outgrowth of discussions on the .ref PEP, but it's also > something I've been thinking about for over a year and starting toying with > at the last PyCon. I have a patch that passes all but a couple unit tests > and should pass though when I get a minute to take another pass at it. > I'll probably end up adding a bunch more unit tests before I'm done as > well. However, the functionality is mostly there. > > BTW, I gotta say, Brett, I have a renewed appreciation for the long and > hard effort you put into importlib. There are just so many odd corner > cases that I never would have looked for if not for that library. And > those unit tests do a great job of covering all of that. Thanks! > Welcome! And yes, importlib didn't take multiple years out of laziness, but just how much work had to go in to cover corner cases along with pauses from frustration with the semantics. :P > > -eric > > > ------------------------------------------------------------------------------- > > PEP: 4XX > Title: A ModuleSpec Type for the Import System > Version: $Revision$ > Last-Modified: $Date$ > Author: Eric Snow > BDFL-Delegate: ??? > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Aug-2013 > Python-Version: 3.4 > Post-History: 8-Aug-2013 > Resolution: > > > Abstract > ======== > > This PEP proposes to add a new class to ``importlib.machinery`` called > ``ModuleSpec``. It will contain all the import-related information > about a module without needing to load the module first. Finders will > now return a module's spec rather than a loader. The import system will > use the spec to load the module. > > > Motivation > ========== > > The import system has evolved over the lifetime of Python. In late 2002 > PEP 302 introduced standardized import hooks via ``finders`` and > ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced > with Python 3.1, now exposes a pure Python implementation of the APIs > described by PEP 302, as well as of the full import system. It is now > much easier to understand and extend the import system. While a benefit > to the Python community, this greater accessibilty also presents a > challenge. > > As more developers come to understand and customize the import system, > any weaknesses in the finder and loader APIs will be more impactful. So > the sooner we can address any such weaknesses the import system, the > better...and there are a couple we can take care of with this proposal. > > Firstly, any time the import system needs to save information about a > module we end up with more attributes on module objects that are > generally only meaningful to the import system and occoasionally to some > people. It would be nice to have a per-module namespace to put future > import-related information. Secondly, there's an API void between > finders and loaders that causes undue complexity when encountered. > > Finders are strictly responsible for providing the loader which the > import system will use to load the module. The loader is then > responsible for doing some checks, creating the module object, setting > import-related attributes, "installing" the module to ``sys.modules``, > and loading the module, along with some cleanup. This all takes place > during the import system's call to ``Loader.load_module()``. Loaders > also provide some APIs for accessing data associated with a module. > > Loaders are not required to provide any of the functionality of > ``load_module()`` through other methods. Thus, though the import- > related information about a module is likely available without loading > the module, it is not otherwise exposed. > > Furthermore, the requirements assocated with ``load_module()`` are > common to all loaders and mostly are implemented in exactly the same > way. This means every loader has to duplicate the same boilerplate > code. ``importlib.util`` provides some tools that help with this, but > it would be more helpful if the import system simply took charge of > these responsibilities. The trouble is that this would limit the degree > of customization that ``load_module()`` facilitates. This is a gap > between finders and loaders which this proposal aims to fill. > > Finally, when the import system calls a finder's ``find_module()``, the > finder makes use of a variety of information about the module that is > useful outside the context of the method. Currently the options are > limited for persisting that per-module information past the method call, > since it only returns the loader. Either store it in a module-to-info > mapping somewhere like on the finder itself, or store it on the loader. > The two previous sentences are hard to read; I think you were after something like, "Popular options for this limitation are to store the information is in a module-to-info mapping somewhere on the finder itself, or store it on the loader. > Unfortunately, loaders are not required to be module-specific. On top > of that, some of the useful information finders could provide is > common to all finders, so ideally the import system could take care of > that. This is the same gap as before between finders and loaders. > > As an example of complexity attributable to this flaw, the > implementation of namespace packages in Python 3.3 (see PEP 420) added > ``FileFinder.find_loader()`` because there was no good way for > ``find_module()`` to provide the namespace path. > > The answer to this gap is a ``ModuleSpec`` object that contains the > per-module information and takes care of the boilerplate functionality > of loading the module. > > (The idea grew feet during discussions related to another PEP.[1]) > "(This PEP grew out of discussions related to another PEP [1])" > > > Specification > ============= > > ModuleSpec > ---------- > > A new class which defines the import-related values to use when loading > the module. It closely corresponds to the import-related attributes of > module objects. ``ModuleSpec`` objects may also be used by finders and > loaders and other import-related APIs to hold extra import-related > information about the module. This greatly reduces the need to add any > new import-related attributes to module objects. > > Attributes: > > * ``name`` - the module's name (compare to ``__name__``). > * ``loader`` - the loader to use during loading and for module data > (compare to ``__loader__``). > * ``package`` - the name of the module's parent (compare to > ``__package__``). > * ``is_package`` - whether or not the module is a package. > I think is_package() is redundant in the face of 'name'/'package' or 'path' as you can introspect the same information. I honestly have always found it a weakness of InspectLoader.is_package() that it didn't return the value for __path__. > * ``origin`` - the location from which the module originates. > Don't quite follow what this is meant to represent? Like the path to the zipfile if loaded that way, otherwise it's the file path? > * ``filename`` - like origin, but limited to a path-based location > (compare to ``__file__``). > * ``cached`` - the location where the compiled module should be stored > (compare to ``__cached__``). > * ``path`` - the list of path entries in which to search for submodules > or ``None``. (compare to ``__path__``). It should be in sync with > ``is_package``. > Why is 'path' the only attribute with a default value? Should probably say everything has a default value of None if not set/known. > > Those are also the parameters to ``ModuleSpec.__init__()``, in that > order. > I would consider arguing all arguments should be keyword-only past 'name' since there is no way most people will remember that order correctly. > The last three are optional. > (filename, cached, and path). And that definitely makes is_package redundant if that's true. > When passed the values are taken > as-is. The ``from_loader()`` method offers calculated values. > "(see below)." > > Methods: > > * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the > arguments. The parameters are the same as with ``__init__``, except > ``package`` is excluded and only ``name`` and ``loader`` are required. > Why the switch in requirements compared to __init__()? > * ``module_repr()`` - returns a repr for the module. > * ``init_module_attrs(module)`` - sets the module's import-related > attributes. > Specify what those attributes are and how they are set. > * ``load(module=None, *, is_reload=False)`` - calls the loader's > ``exec_module()``, falling back to ``load_module()`` if necessary. > This method performs the former responsibilities of loaders for > managing modules before actually loading and for cleaning up. The > reload case is facilitated by the ``module`` and ``is_reload`` > parameters. > If a module is provided and there is already a matching key in sys.modules, what happens? What if is_reload is True but there is no module provided or in sys.modules; KeyError, ValueError, ImportError? Do you follow having None in sys.modules and raise ImportError, or do you overwrite (same question if a module is explicitly provided)? > > Values Derived by from_loader() > ------------------------------- > > As implied above, ``from_loader()`` makes a best effort at calculating > any of the values that are not passed in. It duplicates the behavior > that was formerly provided the several ``importlib.util`` functions as > well as the ``init_module_attrs()`` method of several of ``importlib``'s > loaders. Just to be clear, here is a more detailed description of those > calculations: > > ``is_package`` is derived from ``path``, if passed. Otherwise the > loader's ``is_package()`` is tried. Finally, it defaults to False. > It can also be calculated based on whether ``name`` == ``package``: ``True if path is not None else name == package``. Always need to watch out for [] for path as that is valid and signals the module is a package. This is where defining exactly what details need to be passed in and which ones are optional are going to be critical in determining what represents ambiguity/unknown details vs. what is flat-out known to be true/false. > > ``filename`` is pulled from the loader's ``get_filename()``, if > possible. > > ``path`` is set to an empty list if ``is_package`` is true, and the > directory from ``filename`` is appended to it, if available. > > ``cached`` is derived from ``filename`` if it's available. > Derived how? > > ``origin`` is set to ``filename``. > > ``package`` is set to ``name`` if the module is a package and > "... is a package, else to ..." > to ``name.rpartition('.')[0]`` otherwise. Consequently, a > top-level module will have ``package`` set to the empty string. > > Backward Compatibility > ---------------------- > > Since finder ``find_module()`` > ``Finder.find_module()`` > methods would now return a module spec > instead of loader, specs must act like the loader that would have been > returned instead. This is relatively simple to solve since the loader > is available as an attribute of the spec. > Are you going to define a __getattr__ to delegate to the loader? Or are you going to specifically define equivalent methods, e.g. get_filename() is obviously solvable by getting the attribute from the spec (as long as filename is a required value)? > > However, ``ModuleSpec.is_package`` (an attribute) conflicts with > ``InspectLoader.is_package()`` (a method). Working around this requires > a more complicated solution but is not a large obstacle. > > Unfortunately, the ability to proxy does not extend to ``id()`` > comparisons and ``isinstance()`` tests. In the case of the return value > of ``find_module()``, we accept that break in backward compatibility. > Mention that ModuleSpec can be added to the proper ABCs in importlib.abc to help alleviate this issue. > > Subclassing > ----------- > > .. XXX Allowed but discouraged? > Why should it matter if they are subclassed? > > Module Objects > -------------- > > Module objects will now have a ``__spec__`` attribute to which the > module's spec will be bound. None of the other import-related module > attributes will be changed or deprecated, though some of them could be. > Any such deprecation can wait until Python 4. > "... could be; any such ..." > > ``ModuleSpec`` objects will not be kept in sync with the corresponding > module object's import-related attributes. They may differ, though in > practice they will be the same. > "Though they may differ, in practice they will typically be the same." > > Finders > ------- > > Finders will now return ModuleSpec objects when ``find_module()`` is > called rather than loaders. For backward compatility, ``Modulespec`` > objects proxy the attributes of their ``loader`` attribute. > > Adding another similar method to avoid backward-compatibility issues > is undersireable if avoidable. The import APIs have suffered enough. > in lieu of the fact that find_loader() was just introduced in Python 3.3. > The approach taken by this PEP should be sufficient. > > The change to ``find_module()`` applies to both ``MetaPathFinder`` and > ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be > deprecated and, for backward compatibility, implicitly special-cased if > the method exists on a finder. > > Loaders > ------- > > Loaders will have a new method, ``exec_module(module)``. Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. > > The ``load_module()`` of loaders will still work and be an active part > of the loader API. It is still useful for cases where the default > module creation/prepartion/cleanup is not appropriate for the loader. > But will it still be required? Obviously importlib.abc.Loader can grow a default load_module() defined around exec_module(), but it should be clear if we expect the method to always be manually defined or if it will eventually go away. > > A loader must have ``exec_module()`` or ``load_module()`` defined. If > both exist on the loader, ``exec_module()`` is used and > ``load_module()`` is ignored. > Ignored by whom? Should specify that the import system is the one doing the ignoring. > > PEP 420 introduced the optional ``module_repr()`` loader method to limit > the amount of special-casing in the module type's ``__repr__()``. Since > this method is part of ``ModuleSpec``, it will be deprecated on loaders. > However, if it exists on a loader it will be used exclusively. > > The loader ``init_module_attr()`` method, added for Python 3.4 will be > eliminated in favor of the same method on ``ModuleSpec``. > "method, added prior to Python 3.4's release, will be removed ..." > > However, ``InspectLoader.is_package()`` will not be deprecated even > though the same information is found on ``ModuleSpec``. ``ModuleSpec`` > can use it to populate its own ``is_package`` if that information is > not otherwise available. Still, it will be made optional. > > In addition to executing a module during loading, loaders will still be > directly responsible for providing APIs concerning module-related data. > > Other Changes > ------------- > > * The various finders and loaders provided by ``importlib`` will be > updated to comply with this proposal. > > * The spec for the ``__main__`` module will reflect how the interpreter > was started. For instance, with ``-m`` the spec's name will be that of > the run module, while ``__main__.__name__`` will still be "__main__". > > * We add ``importlib.find_module()`` to mirror > ``importlib.find_loader()`` (which becomes deprecated). > > * Deprecations in ``importlib.util``: ``set_package()``, > ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` > (introduced in 3.4) can be removed. > "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless and will outlive 3.4 so specifying it never went public is important. > > * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. > > * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of > the per-module import lock, whereas ``Loader.load_module()`` did not. > > Reference Implementation > ------------------------ > > A reference implementation is available at . > > > References > ========== > > [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html > > > Copyright > ========= > > This document has been placed in the public domain. > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Aug 9 20:20:45 2013 From: brett at python.org (Brett Cannon) Date: Fri, 9 Aug 2013 14:20:45 -0400 Subject: [Import-SIG] Rough PEP: A ModuleSpec Type for the Import System In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 2:03 PM, Eric Snow wrote: > On Fri, Aug 9, 2013 at 8:40 AM, Brett Cannon wrote: > >> On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow wrote: >> >>> Finally, when the import system calls a finder's ``find_module()``, the >>> >> finder makes use of a variety of information about the module that is >>> useful outside the context of the method. Currently the options are >>> limited for persisting that per-module information past the method call, >>> since it only returns the loader. Either store it in a module-to-info >>> mapping somewhere like on the finder itself, or store it on the loader. >>> >> >> The two previous sentences are hard to read; I think you were after >> something like, >> "Popular options for this limitation are to store the information is in a >> module-to-info >> mapping somewhere on the finder itself, or store it on the loader. >> > > Sounds good. > > >> >> >>> (The idea grew feet during discussions related to another PEP.[1]) >>> >> >> "(This PEP grew out of discussions related to another PEP [1])" >> > > Yeah, this was one of the last things I added to the PEP and my brain was > starting to get a little fuzzy. :) > > >> * ``is_package`` - whether or not the module is a package. >>> >> >> I think is_package() is redundant in the face of 'name'/'package' or >> 'path' as you can introspect the same information. I honestly have always >> found it a weakness of InspectLoader.is_package() that it didn't return the >> value for __path__. >> > > I see what you mean, but I also think it's nice to be able to explicitly > see if a spec is for a package without having to know about underlying > rules. However, I'll just make it a property instead of something set on > the spec (and remove it from __init__). > > >> >> >>> * ``origin`` - the location from which the module originates. >>> >> >> Don't quite follow what this is meant to represent? Like the path to the >> zipfile if loaded that way, otherwise it's the file path? >> > > Yeah, Antoine had the same question. I'll make sure the PEP is clearer. > Basically filename maps to the module's __file__ and origin is used for > the module's repr if filename isn't set. > > >> >> >>> * ``filename`` - like origin, but limited to a path-based location >>> (compare to ``__file__``). >>> * ``cached`` - the location where the compiled module should be stored >>> (compare to ``__cached__``). >>> * ``path`` - the list of path entries in which to search for submodules >>> or ``None``. (compare to ``__path__``). It should be in sync with >>> ``is_package``. >>> >> >> Why is 'path' the only attribute with a default value? Should probably >> say everything has a default value of None if not set/known. >> > > Good point. > > >> >> >>> >>> Those are also the parameters to ``ModuleSpec.__init__()``, in that >>> order. >>> >> >> I would consider arguing all arguments should be keyword-only past 'name' >> since there is no way most people will remember that order correctly. >> > > Makes sense, though I'll make everything but name and loader keyword-only. > > >> * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from >>> the >>> arguments. The parameters are the same as with ``__init__``, except >>> ``package`` is excluded and only ``name`` and ``loader`` are required. >>> >> >> Why the switch in requirements compared to __init__()? >> > > Because package is always calculated and only name and loader are > necessary to calculate the remaining attributes. Perhaps from_loader() is > the wrong name (I'm open to alternatives). Perhaps __init__() should take > over some of the calculating. My intention is to provide one API for > what-you-pass-in-is-what-you-get (__init__) and another for calculating > attributes. Of course, one could simply modify the spec after creating it, > but I like idea of explicitly opting in to calculated values. I'll add > this point to the PEP. Also I'll probably also drop package as a parameter > of __init__ and make the attribute a property. > > I've also toyed with the idea of making all the attributes properties (aka > read-only) since changing a module's spec later on could lead to headache, > but I'm not convinced that is a easy problem to cause. It's better to not > get in the way of those who have needs I haven't anticipated (consenting > adults, etc.). What do you think? > I agree with your thinking that you should necessarily block usage just because it might be a bad idea; consenting adults and all is right. > > >> >> >>> * ``module_repr()`` - returns a repr for the module. >>> * ``init_module_attrs(module)`` - sets the module's import-related >>> attributes. >>> >> >> Specify what those attributes are and how they are set. >> > > Will do. > > >> >> >>> * ``load(module=None, *, is_reload=False)`` - calls the loader's >>> ``exec_module()``, falling back to ``load_module()`` if necessary. >>> This method performs the former responsibilities of loaders for >>> managing modules before actually loading and for cleaning up. The >>> reload case is facilitated by the ``module`` and ``is_reload`` >>> parameters. >>> >> >> If a module is provided and there is already a matching key in >> sys.modules, what happens? >> > What if is_reload is True but there is no module provided or in >> sys.modules; KeyError, ValueError, ImportError? Do you follow having None >> in sys.modules and raise ImportError, or do you overwrite (same question if >> a module is explicitly provided)? >> > > That's a good point. I thought I had addressed this in the PEP, but > apparently not. For Loader.load_module(), as you know, the existence of > the key in sys.modules indicates a reload should happen. The is_reload > parameter is meant to provide an explicit indicator. The module you pass > in is simply the one to use. If a module is not passed in and is_reload is > true, the module in sys.modules will be used. If that module is None or > not there, ImportError would be raised. If a module is passed in and > is_reload is false, I was planning on just ignoring that module. However > raising ValueError in that case would be more useful, indicating that the > method was called incorrectly. > > Having just the module parameter and letting it indicate a reload is > doable, but that would mean losing the option of having load() look up the > module (and it's less explicit). Another option is to have a separate > reload() method. Antoine mentioned it and I'd considered it early on. I'm > considering it again since it makes the API less complicated. Do you have > a preference between the current proposal (load() does it all) and a > separate reload() method? > Nope, no preference. > > ``is_package`` is derived from ``path``, if passed. Otherwise the >>> loader's ``is_package()`` is tried. Finally, it defaults to False. >>> >> >> It can also be calculated based on whether ``name`` == ``package``: >> ``True if path is not None else name == package``. >> > > Good point, though at this point I don't think package will be something > you set. > So you would set 'name' and 'path' to decide if something is a package and use that to calculate 'package'? > > Always need to watch out for [] for path as that is valid and signals the >> module is a package. >> > > Yeah, I've got that covered in from_loader(). > > This is where defining exactly what details need to be passed in and which >> ones are optional are going to be critical in determining what represents >> ambiguity/unknown details vs. what is flat-out known to be true/false. >> > > Agreed. I'll be sure to spell it out. > > >> ``cached`` is derived from ``filename`` if it's available. >>> >> >> Derived how? >> > > cache_from_source() > I figured, but I know too much about this stuff. =) I would spell it out in the PEP. > > >> methods would now return a module spec >>> instead of loader, specs must act like the loader that would have been >>> returned instead. This is relatively simple to solve since the loader >>> is available as an attribute of the spec. >>> >> >> Are you going to define a __getattr__ to delegate to the loader? Or are >> you going to specifically define equivalent methods, e.g. get_filename() is >> obviously solvable by getting the attribute from the spec (as long as >> filename is a required value)? >> > > __getattr__(). I don't want to guess what methods a loader might have. > And if someone wants to call get_filename() on what they think is the > loader, I think it's better to just call the loader's get_filename(). I'd > left this stuff out as an implementation detail. Do you think it should be > in the PEP? I could simply elaborate on "specs must act like the loader". > I would elaborate that it's going to be __getattr__() since it influences the level of backwards-compatibility. > > >> >> >>> >>> However, ``ModuleSpec.is_package`` (an attribute) conflicts with >>> ``InspectLoader.is_package()`` (a method). Working around this requires >>> a more complicated solution but is not a large obstacle. >>> >>> Unfortunately, the ability to proxy does not extend to ``id()`` >>> comparisons and ``isinstance()`` tests. In the case of the return value >>> of ``find_module()``, we accept that break in backward compatibility. >>> >> >> Mention that ModuleSpec can be added to the proper ABCs in importlib.abc >> to help alleviate this issue. >> > > Good point. > > >> >> >>> >>> Subclassing >>> ----------- >>> >>> .. XXX Allowed but discouraged? >>> >> >> Why should it matter if they are subclassed? >> > > My goal was for ModuleSpec to be the container for module definition state > with some common attributes as a baseline and a minimal number of methods > for the import system to use. Loaders would be where you would do extra > stuff or customize functionality, which is basically what happens now. > > It seemed correct before but now it's feeling like a very artificial and > unnecessary objective. > I totally get where you are coming from and if we were working in a language that pushed for read-only attributes I would agree, but we aren't so I wouldn't. =) It just becomes more hassle than it's worth to enforce. > > Finders >>> ------- >>> >>> Finders will now return ModuleSpec objects when ``find_module()`` is >>> called rather than loaders. For backward compatility, ``Modulespec`` >>> objects proxy the attributes of their ``loader`` attribute. >>> >>> Adding another similar method to avoid backward-compatibility issues >>> is undersireable if avoidable. The import APIs have suffered enough. >>> >> >> in lieu of the fact that find_loader() was just introduced in Python 3.3. >> > > Are you suggesting additional wording or making a comment? > Both? =) > > >> >>> Loaders >>> ------- >>> >>> Loaders will have a new method, ``exec_module(module)``. Its only job >>> is to "exec" the module and consequently populate the module's >>> namespace. It is not responsible for creating or preparing the module >>> object, nor for any cleanup afterward. It has no return value. >>> >>> The ``load_module()`` of loaders will still work and be an active part >>> of the loader API. It is still useful for cases where the default >>> module creation/prepartion/cleanup is not appropriate for the loader. >>> >> >> But will it still be required? Obviously importlib.abc.Loader can grow a >> default load_module() defined around exec_module(), but it should be clear >> if we expect the method to always be manually defined or if it will >> eventually go away. >> > > load_module() will no longer be required. However, it still serves a real > purpose: the loader may still need to control more of the loading process. > By implementing load_module() but not exec_module(), a loader gets that. > I'm make sure that's clear. > > >> >> >>> >>> A loader must have ``exec_module()`` or ``load_module()`` defined. If >>> both exist on the loader, ``exec_module()`` is used and >>> ``load_module()`` is ignored. >>> >> >> Ignored by whom? Should specify that the import system is the one doing >> the ignoring. >> > > Got it. > > >> * Deprecations in ``importlib.util``: ``set_package()``, >>> >> ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` >>> (introduced in 3.4) can be removed. >>> >> >> "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless >> and will outlive 3.4 so specifying it never went public is important. >> > > Good catch. You should be a PEP editor. > Ha! Being a PEP editor means I know how to use hg, run a make command, and can count. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Aug 10 00:58:09 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 9 Aug 2013 16:58:09 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" Message-ID: Here's an updated version of the PEP for ModuleSpec which addresses the feedback I've gotten. Thanks for the help. The big open question, to me, is whether or not to have a separate reload() method. I'll be looking into that when I get a chance. There's also the question of a path-based subclass, but I'm currently not convinced it's worth it. -eric ----------------------------------- PEP: 4XX Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow BDFL-Delegate: ??? Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013 Resolution: Abstract ======== This PEP proposes to add a new class to ``importlib.machinery`` called ``ModuleSpec``. It will contain all the import-related information about a module without needing to load the module first. Finders will now return a module's spec rather than a loader. The import system will use the spec to load the module. Motivation ========== The import system has evolved over the lifetime of Python. In late 2002 PEP 302 introduced standardized import hooks via ``finders`` and ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibilty also presents a challenge. As more developers come to understand and customize the import system, any weaknesses in the finder and loader APIs will be more impactful. So the sooner we can address any such weaknesses the import system, the better...and there are a couple we can take care of with this proposal. Firstly, any time the import system needs to save information about a module we end up with more attributes on module objects that are generally only meaningful to the import system and occoasionally to some people. It would be nice to have a per-module namespace to put future import-related information. Secondly, there's an API void between finders and loaders that causes undue complexity when encountered. Finders are strictly responsible for providing the loader which the import system will use to load the module. The loader is then responsible for doing some checks, creating the module object, setting import-related attributes, "installing" the module to ``sys.modules``, and loading the module, along with some cleanup. This all takes place during the import system's call to ``Loader.load_module()``. Loaders also provide some APIs for accessing data associated with a module. Loaders are not required to provide any of the functionality of ``load_module()`` through other methods. Thus, though the import- related information about a module is likely available without loading the module, it is not otherwise exposed. Furthermore, the requirements assocated with ``load_module()`` are common to all loaders and mostly are implemented in exactly the same way. This means every loader has to duplicate the same boilerplate code. ``importlib.util`` provides some tools that help with this, but it would be more helpful if the import system simply took charge of these responsibilities. The trouble is that this would limit the degree of customization that ``load_module()`` facilitates. This is a gap between finders and loaders which this proposal aims to fill. Finally, when the import system calls a finder's ``find_module()``, the finder makes use of a variety of information about the module that is useful outside the context of the method. Currently the options are limited for persisting that per-module information past the method call, since it only returns the loader. Popular options for this limitation are to store the information in a module-to-info mapping somewhere on the finder itself, or store it on the loader. Unfortunately, loaders are not required to be module-specific. On top of that, some of the useful information finders could provide is common to all finders, so ideally the import system could take care of that. This is the same gap as before between finders and loaders. As an example of complexity attributable to this flaw, the implementation of namespace packages in Python 3.3 (see PEP 420) added ``FileFinder.find_loader()`` because there was no good way for ``find_module()`` to provide the namespace path. The answer to this gap is a ``ModuleSpec`` object that contains the per-module information and takes care of the boilerplate functionality of loading the module. (The idea gained momentum during discussions related to another PEP.[1]) Specification ============= The goal is to address the gap between finders and loaders while changing as little of their semantics as possible. Though some functionality and information is moved the new ``ModuleSpec`` type, their semantics should remain the same. However, for the sake of clarity, those semantics will be explicitly identified. A High-Level View ----------------- ... ModuleSpec ---------- A new class which defines the import-related values to use when loading the module. It closely corresponds to the import-related attributes of module objects. ``ModuleSpec`` objects may also be used by finders and loaders and other import-related APIs to hold extra import-related state about the module. This greatly reduces the need to add any new new import-related attributes to module objects, and loader ``__init__`` methods won't need to accommodate such per-module state. Creating a ModuleSpec: ``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None, path=None)`` The parameters have the same meaning as the attributes described below. However, not all ``ModuleSpec`` attributes are also parameters. The passed values are set as-is. For calculated values use the ``from_loader()`` method. ModuleSpec Attributes --------------------- Each of the following names is an attribute on ``ModuleSpec`` objects. A value of ``None`` indicates "not set". This contrasts with module objects where the attribute simply doesn't exist. While ``package`` and ``is_package`` are read-only properties, the remaining attributes can be replaced after the module spec is created and after import is complete. This allows for unusual cases where modifying the spec is the best option. However, typical use should not involve changing the state of a module's spec. Most of the attributes correspond to the import-related attributes of modules. Here is the mapping, followed by a description of the attributes. The reverse of this mapping is used by ``init_module_attrs()``. ============= =========== On ModuleSpec On Modules ============= =========== name __name__ loader __loader__ package __package__ is_package - origin - filename __file__ cached __cached__ path __path__ ============= =========== ``name`` The module's fully resolved and absolute name. It must be set. ``loader`` The loader to use during loading and for module data. These specific functionalities do not change for loaders. Finders are still responsible for creating the loader and this attribute is where it is stored. The loader must be set. ``package`` The name of the module's parent. This is a dynamic attribute with a value derived from ``name`` and ``is_package``. For packages it is the value of ``name``. Otherwise it is equivalent to ``name.rpartition('.')[0]``. Consequently, a top-level module will have give the empty string for ``package``. ``is_package`` Whether or not the module is a package. This dynamic attribute is True if ``path`` is set (even if empty), else it is false. ``origin`` A string for the location from which the module originates. If ``filename`` is set, ``origin`` should be set to the same value unless some other value is more appropriate. ``origin`` is used in ``module_repr()`` if it does not match the value of ``filename``. Using ``filename`` for this meaning would be inaccurate, since not all modules have path-based locations. For instance, built-in modules do not have ``__file__`` set. Yet it is useful to have a descriptive string indicating that it originated from the interpreter as a built-in module. So built-in modules will have ``origin`` set to ``"built-in"``. Path-based attributes: If any of these is set, it indicates that the module is path-based. For reference, a path entry is a string for a location where the import system will look for modules, e.g. the path entries in ``sys.path`` or a package's ``__path__``). ``filename`` Like ``origin``, but limited to a path-based location. If ``filename`` is set, ``origin`` should be set to the same string, unless origin is explicitly set to something else. ``filename`` is not necessarily an actual file name, but could be any location string based on a path entry. Regarding the attribute name, while it is potentially inaccurate, it is both consistent with the equivalent module attribute and generally accurate. .. XXX Would a different name be better? ``path_location``? ``cached`` The path-based location where the compiled code for a module should be stored. If ``filename`` is set to a source file, this should be set to corresponding path that PEP 3147 specifies. The ``importlib.util.source_to_cache()`` function facilitates getting the correct value. ``path`` The list of path entries in which to search for submodules if this module is a package. Otherwise it is ``None``. .. XXX add a path-based subclass? ModuleSpec Methods ------------------ ``from_loader(name, loader, *, is_package=None, origin=None, filename=None, cached=None, path=None)`` .. XXX use a different name? A factory classmethod that returns a new ``ModuleSpec`` derived from the arguments. ``is_package`` is used inside the method to indicate that the module is a package. If not explicitly passed in, it is set to ``True`` if ``path`` is passed in. It falls back to using the result of the loader's ``is_package()``, if available. Finally it defaults to False. The remaining parameters have the same meaning as the corresponding ``ModuleSpec`` attributes. In contrast to ``ModuleSpec.__init__()``, which takes the arguments as-is, ``from_loader()`` calculates missing values from the ones passed in, as much as possible. This replaces the behavior that is currently provided the several ``importlib.util`` functions as well as the optional ``init_module_attrs()`` method of loaders. Just to be clear, here is a more detailed description of those calculations:: If not passed in, ``filename`` is to the result of calling the loader's ``get_filename()``, if available. Otherwise it stays unset (``None``). If not passed in, ``path`` is set to an empty list if ``is_package`` is true. Then the directory from ``filename`` is appended to it, if possible. If ``is_package`` is false, ``path`` stays unset. If ``cached`` is not passed in and ``filename`` is passed in, ``cached`` is derived from it. For filenames with a source suffix, it set to the result of calling ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. ``.pyc``), ``cached`` is set to the value of ``filename``. If ``filename`` is not passed in or ``cache_from_source()`` raises ``NotImplementedError``, ``cached`` stays unset. If not passed in, ``origin`` is set to ``filename``. Thus if ``filename`` is unset, ``origin`` stays unset. ``module_repr()`` Returns a repr string for the module if ``origin`` is set and ``filename`` is not set. The string refers to the value of ``origin``. Otherwise ``module_repr()`` returns None. This indicates to the module type's ``__repr__()`` that it should fall back to the default repr. We could also have ``module_repr()`` produce the repr for the case where ``filename`` is set or where ``origin`` is not set, mirroring the repr that the module type produces directly. However, the repr string is derived from the import-related module attributes, which might be out of sync with the spec. .. XXX Is using the spec close enough? Probably not. The implementation of the module type's ``__repr__()`` will change to accommodate this PEP. However, the current functionality will remain to handle the case where a module does not have a ``__spec__`` attribute. ``init_module_attrs(module)`` Sets the module's import-related attributes to the corresponding values in the module spec. If a path-based attribute is not set on the spec, it is not set on the module. For the rest, a ``None`` value on the spec (aka "not set") means ``None`` will be set on the module. If any of the attributes are already set on the module, the existing values are replaced. The module's own ``__spec__`` is not consulted but does get replaced with the spec on which ``init_module_attrs()`` was called. The earlier mapping of ``ModuleSpec`` attributes to module attributes indicates which attributes are involved on both sides. ``load(module=None, *, is_reload=False)`` This method captures the current functionality of and requirements on ``Loader.load_module()`` without any semantic changes, except one. Reloading a module when ``exec_module()`` is available actually uses ``module`` rather than ignoring it in favor of the one in ``sys.modules``, as ``Loader.load_module()`` does. ``module`` is only allowed when ``is_reload`` is true. This means that ``is_reload`` could be dropped as a parameter. However, doing so would mean we could not use ``None`` to indicate that the module should be pulled from ``sys.modules``. Furthermore, ``is_reload`` makes the intent of the call clear. There are two parts to what happens in ``load()``. First, the module is prepared, loaded, updated appropriately, and left available for the second part. This is described in more detail shortly. Second, in the case of error during a normal load (not reload) the module is removed from ``sys.modules``. If no error happened, the module is pulled from ``sys.modules``. This the module returned by ``load()``. Before it is returned, if it is a different object than the one produced by the first part, attributes of the module from ``sys.modules`` are updated to reflect the spec. Returning the module from ``sys.modules`` accommodates the ability of the module to replace itself there while it is executing (during load). As already noted, this is what already happens in the import system. ``load()`` is not meant to change any of this behavior. Regarding the first part of ``load()``, the following describes what happens. It depends on if ``is_reload`` is true and if the loader has ``exec_module()``. For normal load with ``exec_module()`` available:: A new module is created, ``init_module_attrs()`` is called to set its attributes, and it is set on sys.modules. At that point the loader's ``exec_module()`` is called, after which the module is ready for the second part of loading. .. XXX What if the module already exists in sys.modules? For normal load without ``exec_module()`` available:: The loader's ``load_module()`` is called and the attributes of the module it returns are updated to match the spec. For reload with ``exec_module()`` available:: If ``module`` is ``None``, it is pulled from ``sys.modules``. If still ``None``, ImportError is raised. Otherwise ``exec_module()`` is called, passing in the module-to-be-reloaded. For reload without ``exec_module()`` available:: The loader's ``load_module()`` is called and the attributes of the module it returns are updated to match the spec. There is some boilerplate involved when ``exec_module()`` is available, but only the boilerplate that the import system uses currently. If ``loader`` is not set (``None``), ``load()`` raises a ValueError. If ``module`` is passed in but ``is_reload`` is false, a ValueError is also raises to indicate that ``load()`` was called incorrectly. There may be use cases for calling ``load()`` in that way, but they are outside the scope of this PEP .. XXX add reload(module=None) and drop load()'s parameters entirely? .. XXX add more of importlib.reload()'s boilerplate to load()/reload()? Backward Compatibility ---------------------- Since ``Finder.find_module()`` methods would now return a module spec instead of loader, specs must act like the loader that would have been returned instead. This is relatively simple to solve since the loader is available as an attribute of the spec. We will use ``__getattr__()`` to do it. However, ``ModuleSpec.is_package`` (an attribute) conflicts with ``InspectLoader.is_package()`` (a method). Working around this requires a more complicated solution but is not a large obstacle. Simply making ``ModuleSpec.is_package`` a method does not reflect that is a relatively static piece of data. ``module_repr()`` also conflicts with the same method on loaders, but that workaround is not complicated since both are methods. Unfortunately, the ability to proxy does not extend to ``id()`` comparisons and ``isinstance()`` tests. In the case of the return value of ``find_module()``, we accept that break in backward compatibility. However, we will mitigate the problem with ``isinstance()`` somewhat by registering ``ModuleSpec`` on the loaders in ``importlib.abc``. Subclassing ----------- Subclasses of ModuleSpec are allowed, but should not be necessary. Adding functionality to a custom finder or loader will likely be a better fit and should be tried first. However, as long as a subclass still fulfills the requirements of the import system, objects of that type are completely fine as the return value of ``find_module()``. Module Objects -------------- Module objects will now have a ``__spec__`` attribute to which the module's spec will be bound. None of the other import-related module attributes will be changed or deprecated, though some of them could be; any such deprecation can wait until Python 4. ``ModuleSpec`` objects will not be kept in sync with the corresponding module object's import-related attributes. Though they may differ, in practice they will typically be the same. Finders ------- Finders will now return ModuleSpec objects when ``find_module()`` is called rather than loaders. For backward compatility, ``Modulespec`` objects proxy the attributes of their ``loader`` attribute. Adding another similar method to avoid backward-compatibility issues is undersireable if avoidable. The import APIs have suffered enough, especially considering ``PathEntryFinder.find_loader()`` was just added in Python 3.3. The approach taken by this PEP should be sufficient to address backward-compatibility issues for ``find_module()``. The change to ``find_module()`` applies to both ``MetaPathFinder`` and ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be deprecated and, for backward compatibility, implicitly special-cased if the method exists on a finder. Finders are still responsible for creating the loader. That loader will now be stored in the module spec returned by ``find_module()`` rather than returned directly. As is currently the case without the PEP, if a loader would be costly to create, that loader can be designed to defer the cost until later. Loaders ------- Loaders will have a new method, ``exec_module(module)``. Its only job is to "exec" the module and consequently populate the module's namespace. It is not responsible for creating or preparing the module object, nor for any cleanup afterward. It has no return value. The ``load_module()`` of loaders will still work and be an active part of the loader API. It is still useful for cases where the default module creation/prepartion/cleanup is not appropriate for the loader. For example, the C API for extension modules only supports the full control of ``load_module()``. As such, ``ExtensionFileLoader`` will not implement ``exec_module()``. In the future it may be appropriate to produce a second C API that would support an ``exec_module()`` implementation for ``ExtensionFileLoader``. Such a change is outside the scope of this PEP. A loader must have at least one of ``exec_module()`` and ``load_module()`` defined. If both exist on the loader, ``ModuleSpec.load()`` uses ``exec_module()`` and ignores ``load_module()``. PEP 420 introduced the optional ``module_repr()`` loader method to limit the amount of special-casing in the module type's ``__repr__()``. Since this method is part of ``ModuleSpec``, it will be deprecated on loaders. However, if it exists on a loader it will be used exclusively. ``Loader.init_module_attr()`` method, added prior to Python 3.4's release , will be removed in favor of the same method on ``ModuleSpec``. However, ``InspectLoader.is_package()`` will not be deprecated even though the same information is found on ``ModuleSpec``. ``ModuleSpec`` can use it to populate its own ``is_package`` if that information is not otherwise available. Still, it will be made optional. The path-based loaders in ``importlib`` take arguments in their ``__init__()`` and have corresponding attributes. However, the need for those values is eliminated. The only exception is ``FileLoader.get_filename()``, which uses ``self.path``. The signatures for these loaders and the accompanying attributes will be deprecated. In addition to executing a module during loading, loaders will still be directly responsible for providing APIs concerning module-related data. Other Changes ------------- * The various finders and loaders provided by ``importlib`` will be updated to comply with this proposal. * The spec for the ``__main__`` module will reflect how the interpreter was started. For instance, with ``-m`` the spec's name will be that of the run module, while ``__main__.__name__`` will still be "__main__". * We add ``importlib.find_module()`` to mirror ``importlib.find_loader()`` (which becomes deprecated). * Deprecations in ``importlib.util``: ``set_package()``, ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` (introduced prior to Python 3.4's release) can be removed. * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of the per-module import lock, whereas ``Loader.load_module()`` did not. Reference Implementation ------------------------ A reference implementation is available at . References ========== [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 11 15:03:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Aug 2013 09:03:00 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: I think this is solid enough to be worth adding to the PEPs repo now. On 9 August 2013 18:58, Eric Snow wrote: > Here's an updated version of the PEP for ModuleSpec which addresses the > feedback I've gotten. Thanks for the help. The big open question, to me, > is whether or not to have a separate reload() method. I'll be looking into > that when I get a chance. There's also the question of a path-based > subclass, but I'm currently not convinced it's worth it. One piece of feedback from me (triggered by the C extension modules discussion on python-dev): we should consider proposing a new "exec" hook for C extension modules that could be defined instead of or in addition to the existing PEP 3121 init hook. Extension modules that don't rely on mutable static variables or the PEP 3121 per-interpreter state APIs could just define the new exec hook and get a new module instance every time they're imported. Those that do have per-interpreter state would still get an opportunity to run additional code after all the magic attributes have been set. Also, to handle the extension module case, we may need to let loaders define an optional "create_module" method that accepts the MethodSpec object as an argument. The extension module loader would implement this as handling the PyInit_ call. (Setting the magic attributes according to the spec would happen automatically after the call, so each loader wouldn't need to implement that part) (Note: once I get back to Australia around the 22nd, I should have time to help out more directly with this) > ----------------------------------- > Firstly, any time the import system needs to save information about a > module we end up with more attributes on module objects that are > generally only meaningful to the import system and occoasionally to some Typo: occoasionally > people. It would be nice to have a per-module namespace to put future > import-related information. Secondly, there's an API void between > finders and loaders that causes undue complexity when encountered. > > Finders are strictly responsible for providing the loader which the "are currently responsible" (since the PEP is about changing the responsibiity of finders, this is a little unclear at present) > Specification > ============= > > The goal is to address the gap between finders and loaders while > changing as little of their semantics as possible. Though some > functionality and information is moved the new ``ModuleSpec`` type, "moved to the new" > their semantics should remain the same. However, for the sake of > clarity, those semantics will be explicitly identified. > > A High-Level View > ----------------- > > ... Not sure a high level view is needed, but you can fill this in if you want :) > > ModuleSpec > ---------- > > A new class which defines the import-related values to use when loading > the module. It closely corresponds to the import-related attributes of > module objects. ``ModuleSpec`` objects may also be used by finders and > loaders and other import-related APIs to hold extra import-related > state about the module. This greatly reduces the need to add any new > new import-related attributes to module objects, and loader ``__init__`` > methods won't need to accommodate such per-module state. To avoid conflicts as the spec attributes evolve in the future, would it be worth having a "custom" field which is just an arbitrary object reference used to pass info from the finder to the loader without troubling the rest of the import system? > Creating a ModuleSpec: > > ``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None, > path=None)`` > > The parameters have the same meaning as the attributes described below. > However, not all ``ModuleSpec`` attributes are also parameters. > The > passed values are set as-is. For calculated values use the > ``from_loader()`` method. This paragraph isn't particularly clear. Perhaps: "Passed in parameter values are assigned directly to the corresponding attributes below. Other attributes not listed as parameters (such as ``package``) are read-only properties that are automatically derived from these values. The ``ModuleSpec.from_loader()`` class method allows a suitable ModuleSpec instance to be easily created from a PEP 302 loader object" > ModuleSpec Attributes > --------------------- > > Each of the following names is an attribute on ``ModuleSpec`` objects. > A value of ``None`` indicates "not set". This contrasts with module > objects where the attribute simply doesn't exist. > > While ``package`` and ``is_package`` are read-only properties, the > remaining attributes can be replaced after the module spec is created > and after import is complete. This allows for unusual cases where > modifying the spec is the best option. However, typical use should not > involve changing the state of a module's spec. I'm with Brett that "is_package" should go, to be replaced by "spec.path is not None" wherever it matters. is_package() would then fall through to the PEP 302 loader API via __getattr__. > ``package`` > > The name of the module's parent. This is a dynamic attribute with a > value derived from ``name`` and ``is_package``. For packages it is the > value of ``name``. Otherwise it is equivalent to > ``name.rpartition('.')[0]``. Consequently, a top-level module will have > give the empty string for ``package``. s/give// > ``is_package`` > > Whether or not the module is a package. This dynamic attribute is True > if ``path`` is set (even if empty), else it is false. As above (i.e. don't use it) > ``origin`` > > A string for the location from which the module originates. If > ``filename`` is set, ``origin`` should be set to the same value unless > some other value is more appropriate. ``origin`` is used in > ``module_repr()`` if it does not match the value of ``filename``. > > Using ``filename`` for this meaning would be inaccurate, since not all > modules have path-based locations. For instance, built-in modules do > not have ``__file__`` set. Yet it is useful to have a descriptive > string indicating that it originated from the interpreter as a built-in > module. So built-in modules will have ``origin`` set to ``"built-in"``. How about we *just* have origin, with a separate "set_fileattr" attribute to indicate "this is a discrete file, you should set __file__"? Also, we should explicitly note that we'll still set __file__ for zip imports, due to backwards compatibility concerns, even though it doesn't correspond to a valid filesystem path. (Random thought: spec.origin + spec.cached + a cache directory setting in zipimport would give a potentially clean way to do extension module imports from zip archives) > ``path`` > > The list of path entries in which to search for submodules if this > module is a package. Otherwise it is ``None``. Path entries don't have to correspond to filesystem locations - they just have to make sense to at least one path hook (e.g. a DB URI would be a valid path entry). > .. XXX add a path-based subclass? Nope :) > ModuleSpec Methods > ------------------ > > ``from_loader(name, loader, *, is_package=None, origin=None, filename=None, > cached=None, path=None)`` > > .. XXX use a different name? I'd disallow customisation on this one - if people want to customise, they should just query the PEP 302 APIs themselves and call the ModuleSpec constructor directly. The use case for this one should be to make it trivial to switch from "return loader" to "return ModuleSpec.from_loader(loader)" in a find_module implementation. > In contrast to ``ModuleSpec.__init__()``, which takes the arguments > as-is, ``from_loader()`` calculates missing values from the ones passed > in, as much as possible. This replaces the behavior that is currently > provided the several ``importlib.util`` functions as well as the > optional ``init_module_attrs()`` method of loaders. Just to be clear, > here is a more detailed description of those calculations:: > > If not passed in, ``filename`` is to the result of calling the > loader's ``get_filename()``, if available. Otherwise it stays > unset (``None``). > > If not passed in, ``path`` is set to an empty list if > ``is_package`` is true. Then the directory from ``filename`` is > appended to it, if possible. If ``is_package`` is false, ``path`` > stays unset. > > If ``cached`` is not passed in and ``filename`` is passed in, > ``cached`` is derived from it. For filenames with a source suffix, > it set to the result of calling > ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. > ``.pyc``), ``cached`` is set to the value of ``filename``. If > ``filename`` is not passed in or ``cache_from_source()`` raises > ``NotImplementedError``, ``cached`` stays unset. > > If not passed in, ``origin`` is set to ``filename``. Thus if > ``filename`` is unset, ``origin`` stays unset. Hmm, is there a reason this can't be the default constructor behaviour? What's the value of *not* having the sensible fallbacks, given they can always be overridden by passing in explicit values when you want something different? A separate "from_module(m)" constructor would probably make sense, though. > ``module_repr()`` > > Returns a repr string for the module if ``origin`` is set and > ``filename`` is not set. The string refers to the value of ``origin``. > Otherwise ``module_repr()`` returns None. This indicates to the module > type's ``__repr__()`` that it should fall back to the default repr. > > We could also have ``module_repr()`` produce the repr for the case where > ``filename`` is set or where ``origin`` is not set, mirroring the repr > that the module type produces directly. However, the repr string is > derived from the import-related module attributes, which might be out of > sync with the spec. > > .. XXX Is using the spec close enough? Probably not. I think it makes sense to always return the expected repr based on the spec attributes, but allow a custom origin to be passed in to handle the case where the module __file__ attribute differs from __spec__.origin (keeping in mind I think __spec__.filename should be replaced with __spec__.set_fileattr) > The implementation of the module type's ``__repr__()`` will change to > accommodate this PEP. However, the current functionality will remain to > handle the case where a module does not have a ``__spec__`` attribute. Experience tells us that the import system should ensure the __spec__ attribute always exists (even if it has to be filled in from the module attributes after calling load_module) > ``load(module=None, *, is_reload=False)`` Yep, definitely needs to be a separate method. "is_reload" would almost always be set to a boolean, which means a separate API is likely to be better. However, I think the separate method should be "exec()" rather than "reload()" and require that the module always be passed in. We could also expose a "create" method that just creates and returns the new module object, and replace importlib.util.module_to_load with a context manager that accepted the module as a parameter. Say "add_to_sys", which fails if the module is already present in sys.modules. load() would then look something like: def load(self): m = self.create() with importlib.util.add_to_sys(m): self.exec(m) return sys.modules[self.name] We could also provide reload() if we wanted to: def reload(self): self.exec(sys.modules[self.name]) return sys.modules[self.name] > Subclassing > ----------- > > Subclasses of ModuleSpec are allowed, but should not be necessary. > Adding functionality to a custom finder or loader will likely be a > better fit and should be tried first. However, as long as a subclass > still fulfills the requirements of the import system, objects of that > type are completely fine as the return value of ``find_module()``. We may need to do subclasses for the ABC registration backwards compatibility hack. > > Module Objects > -------------- > > Module objects will now have a ``__spec__`` attribute to which the > module's spec will be bound. None of the other import-related module > attributes will be changed or deprecated, though some of them could be; > any such deprecation can wait until Python 4. > > ``ModuleSpec`` objects will not be kept in sync with the corresponding > module object's import-related attributes. Though they may differ, in > practice they will typically be the same. Worth mentioning that __main__.__spec__.name will give the real name of module's executed with -m here rather than delaying that until the notes at the end. > Finders > ------- > > Finders will now return ModuleSpec objects when ``find_module()`` is > called rather than loaders. For backward compatility, ``Modulespec`` > objects proxy the attributes of their ``loader`` attribute. > > Adding another similar method to avoid backward-compatibility issues > is undersireable if avoidable. The import APIs have suffered enough, > especially considering ``PathEntryFinder.find_loader()`` was just > added in Python 3.3. The approach taken by this PEP should be > sufficient to address backward-compatibility issues for > ``find_module()``. > > The change to ``find_module()`` applies to both ``MetaPathFinder`` and > ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be > deprecated and, for backward compatibility, implicitly special-cased if > the method exists on a finder. Actually, we don't currently have anything on ModuleSpec to indicate "this is complete, stop scanning for more path fragments" or how we will compose multiple module specs for the individual fragments into a combined spec for the namespace package. > Finders are still responsible for creating the loader. That loader will > now be stored in the module spec returned by ``find_module()`` rather > than returned directly. As is currently the case without the PEP, if a > loader would be costly to create, that loader can be designed to defer > the cost until later. > > Loaders > ------- > > Loaders will have a new method, ``exec_module(module)``. Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. > > The ``load_module()`` of loaders will still work and be an active part > of the loader API. It is still useful for cases where the default > module creation/prepartion/cleanup is not appropriate for the loader. > > For example, the C API for extension modules only supports the full > control of ``load_module()``. As such, ``ExtensionFileLoader`` will not > implement ``exec_module()``. In the future it may be appropriate to > produce a second C API that would support an ``exec_module()`` > implementation for ``ExtensionFileLoader``. Such a change is outside > the scope of this PEP. As above, I think it may worth tackling this. It shouldn't be *that* hard given the higher level changes and will solve some hard problems at the lower level. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Sun Aug 11 22:08:26 2013 From: brett at python.org (Brett Cannon) Date: Sun, 11 Aug 2013 16:08:26 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 6:58 PM, Eric Snow wrote: > Here's an updated version of the PEP for ModuleSpec which addresses the > feedback I've gotten. Thanks for the help. The big open question, to me, > is whether or not to have a separate reload() method. I'll be looking into > that when I get a chance. There's also the question of a path-based > subclass, but I'm currently not convinced it's worth it. > > -eric > > ----------------------------------- > > PEP: 4XX > Title: A ModuleSpec Type for the Import System > Version: $Revision$ > Last-Modified: $Date$ > Author: Eric Snow > BDFL-Delegate: ??? > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Aug-2013 > Python-Version: 3.4 > Post-History: 8-Aug-2013 > Resolution: > > > Abstract > ======== > > This PEP proposes to add a new class to ``importlib.machinery`` called > ``ModuleSpec``. It will contain all the import-related information > about a module without needing to load the module first. Finders will > now return a module's spec rather than a loader. The import system will > use the spec to load the module. > > > Motivation > ========== > > The import system has evolved over the lifetime of Python. In late 2002 > PEP 302 introduced standardized import hooks via ``finders`` and > ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced > with Python 3.1, now exposes a pure Python implementation of the APIs > described by PEP 302, as well as of the full import system. It is now > much easier to understand and extend the import system. While a benefit > to the Python community, this greater accessibilty also presents a > challenge. > > As more developers come to understand and customize the import system, > any weaknesses in the finder and loader APIs will be more impactful. So > the sooner we can address any such weaknesses the import system, the > better...and there are a couple we can take care of with this proposal. > > Firstly, any time the import system needs to save information about a > module we end up with more attributes on module objects that are > generally only meaningful to the import system and occoasionally to some > people. It would be nice to have a per-module namespace to put future > import-related information. Secondly, there's an API void between > finders and loaders that causes undue complexity when encountered. > > Finders are strictly responsible for providing the loader which the > import system will use to load the module. The loader is then > responsible for doing some checks, creating the module object, setting > import-related attributes, "installing" the module to ``sys.modules``, > and loading the module, along with some cleanup. This all takes place > during the import system's call to ``Loader.load_module()``. Loaders > also provide some APIs for accessing data associated with a module. > > Loaders are not required to provide any of the functionality of > ``load_module()`` through other methods. Thus, though the import- > related information about a module is likely available without loading > the module, it is not otherwise exposed. > > Furthermore, the requirements assocated with ``load_module()`` are > common to all loaders and mostly are implemented in exactly the same > way. This means every loader has to duplicate the same boilerplate > code. ``importlib.util`` provides some tools that help with this, but > it would be more helpful if the import system simply took charge of > these responsibilities. The trouble is that this would limit the degree > of customization that ``load_module()`` facilitates. This is a gap > between finders and loaders which this proposal aims to fill. > > Finally, when the import system calls a finder's ``find_module()``, the > finder makes use of a variety of information about the module that is > useful outside the context of the method. Currently the options are > limited for persisting that per-module information past the method call, > since it only returns the loader. Popular options for this limitation > are to store the information in a module-to-info mapping somewhere on > the finder itself, or store it on the loader. > > Unfortunately, loaders are not required to be module-specific. On top > of that, some of the useful information finders could provide is > common to all finders, so ideally the import system could take care of > that. This is the same gap as before between finders and loaders. > > As an example of complexity attributable to this flaw, the > implementation of namespace packages in Python 3.3 (see PEP 420) added > ``FileFinder.find_loader()`` because there was no good way for > ``find_module()`` to provide the namespace path. > > The answer to this gap is a ``ModuleSpec`` object that contains the > per-module information and takes care of the boilerplate functionality > of loading the module. > > (The idea gained momentum during discussions related to another PEP.[1]) > > > Specification > ============= > > The goal is to address the gap between finders and loaders while > changing as little of their semantics as possible. Though some > functionality and information is moved the new ``ModuleSpec`` type, > their semantics should remain the same. However, for the sake of > clarity, those semantics will be explicitly identified. > > A High-Level View > ----------------- > > ... > > ModuleSpec > ---------- > > A new class which defines the import-related values to use when loading > the module. It closely corresponds to the import-related attributes of > module objects. ``ModuleSpec`` objects may also be used by finders and > loaders and other import-related APIs to hold extra import-related > state about the module. This greatly reduces the need to add any new > new import-related attributes to module objects, and loader ``__init__`` > methods won't need to accommodate such per-module state. > > Creating a ModuleSpec: > > ``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None, > path=None)`` > > The parameters have the same meaning as the attributes described below. > However, not all ``ModuleSpec`` attributes are also parameters. The > passed values are set as-is. For calculated values use the > ``from_loader()`` method. > > ModuleSpec Attributes > --------------------- > > Each of the following names is an attribute on ``ModuleSpec`` objects. > A value of ``None`` indicates "not set". This contrasts with module > objects where the attribute simply doesn't exist. > > While ``package`` and ``is_package`` are read-only properties, the > remaining attributes can be replaced after the module spec is created > and after import is complete. This allows for unusual cases where > modifying the spec is the best option. However, typical use should not > involve changing the state of a module's spec. > > Most of the attributes correspond to the import-related attributes of > modules. Here is the mapping, followed by a description of the > attributes. The reverse of this mapping is used by > ``init_module_attrs()``. > > ============= =========== > On ModuleSpec On Modules > ============= =========== > name __name__ > loader __loader__ > package __package__ > is_package - > origin - > filename __file__ > cached __cached__ > path __path__ > ============= =========== > > ``name`` > > The module's fully resolved and absolute name. It must be set. > > ``loader`` > > The loader to use during loading and for module data. These specific > functionalities do not change for loaders. Finders are still > responsible for creating the loader and this attribute is where it is > stored. The loader must be set. > > ``package`` > > The name of the module's parent. This is a dynamic attribute with a > value derived from ``name`` and ``is_package``. For packages it is the > value of ``name``. Otherwise it is equivalent to > ``name.rpartition('.')[0]``. Consequently, a top-level module will have > give the empty string for ``package``. > > > ``is_package`` > > Whether or not the module is a package. This dynamic attribute is True > if ``path`` is set (even if empty), else it is false. > "is True if ``path`` is not None (e.g. the empty list is a "true" value), else it is False". > > ``origin`` > > A string for the location from which the module originates. If > ``filename`` is set, ``origin`` should be set to the same value unless > some other value is more appropriate. ``origin`` is used in > ``module_repr()`` if it does not match the value of ``filename``. > > Using ``filename`` for this meaning would be inaccurate, since not all > modules have path-based locations. For instance, built-in modules do > not have ``__file__`` set. Yet it is useful to have a descriptive > string indicating that it originated from the interpreter as a built-in > module. So built-in modules will have ``origin`` set to ``"built-in"``. > I still don't know what you would put there for a zipfile-based loader. Would you still put __file__ or would you put the zipfile? I ask because I would want a way to pass along in a zipfile finder to the loader where the zipfile is located and then the internal location of the file. Otherwise you need to pass in the zip path separately from the internal path to the loader constructor instead of simply passing in a ModuleSpec (e.g. see _split_path in http://bugs.python.org/file30660/zip_importlib.diff). > > Path-based attributes: > > If any of these is set, it indicates that the module is path-based. For > reference, a path entry is a string for a location where the import > system will look for modules, e.g. the path entries in ``sys.path`` or a > package's ``__path__``). > > ``filename`` > > Like ``origin``, but limited to a path-based location. If ``filename`` > is set, ``origin`` should be set to the same string, unless origin is > explicitly set to something else. ``filename`` is not necessarily an > actual file name, but could be any location string based on a path > entry. Regarding the attribute name, while it is potentially > inaccurate, it is both consistent with the equivalent module attribute > and generally accurate. > > .. XXX Would a different name be better? ``path_location``? > > ``cached`` > > The path-based location where the compiled code for a module should be > stored. If ``filename`` is set to a source file, this should be set to > corresponding path that PEP 3147 specifies. The > ``importlib.util.source_to_cache()`` function facilitates getting the > correct value. > > ``path`` > > The list of path entries in which to search for submodules if this > module is a package. Otherwise it is ``None``. > > .. XXX add a path-based subclass? > You mean like namespace package's __path__ object? Or are you saying you want ModuleSpec vs. PackageSpec? > > ModuleSpec Methods > ------------------ > > ``from_loader(name, loader, *, is_package=None, origin=None, > filename=None, cached=None, path=None)`` > > .. XXX use a different name? > > A factory classmethod that returns a new ``ModuleSpec`` derived from the > arguments. ``is_package`` is used inside the method to indicate that > the module is a package. > Why is this parameter instead of the other than inferring from 'path' or loader.is_package() as you fall back on? What's the motivation? > If not explicitly passed in, it is set to > ``True`` if ``path`` is passed in. It falls back to using the result of > the loader's ``is_package()``, if available. Finally it defaults to > False. The remaining parameters have the same meaning as the > corresponding ``ModuleSpec`` attributes. > > In contrast to ``ModuleSpec.__init__()``, which takes the arguments > as-is, ``from_loader()`` calculates missing values from the ones passed > in, as much as possible. This replaces the behavior that is currently > provided the several ``importlib.util`` functions as well as the > "provided by several" > optional ``init_module_attrs()`` method of loaders. Just to be clear, > here is a more detailed description of those calculations:: > > If not passed in, ``filename`` is to the result of calling the > loader's ``get_filename()``, if available. Otherwise it stays > unset (``None``). > > If not passed in, ``path`` is set to an empty list if > ``is_package`` is true. Then the directory from ``filename`` is > appended to it, if possible. If ``is_package`` is false, ``path`` > stays unset. > > If ``cached`` is not passed in and ``filename`` is passed in, > ``cached`` is derived from it. For filenames with a source suffix, > it set to the result of calling > ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. > ``.pyc``), ``cached`` is set to the value of ``filename``. If > ``filename`` is not passed in or ``cache_from_source()`` raises > ``NotImplementedError``, ``cached`` stays unset. > > If not passed in, ``origin`` is set to ``filename``. Thus if > ``filename`` is unset, ``origin`` stays unset. > Why is this a static constructor instead of a method like infer_values() or an infer_values keyword-only argument to the constructor to do this if requested? > > ``module_repr()`` > > Returns a repr string for the module if ``origin`` is set and > ``filename`` is not set. The string refers to the value of ``origin``. > Otherwise ``module_repr()`` returns None. This indicates to the module > type's ``__repr__()`` that it should fall back to the default repr. > This makes me think that origin is an odd name if all it affects is module_repr(). > > We could also have ``module_repr()`` produce the repr for the case where > ``filename`` is set or where ``origin`` is not set, mirroring the repr > that the module type produces directly. However, the repr string is > derived from the import-related module attributes, which might be out of > sync with the spec. > [SNIP] > .. XXX add reload(module=None) and drop load()'s parameters entirely? > If you are going to make these semantics of making the module argument only good for reloading then I say yes, make it a separate method. > .. XXX add more of importlib.reload()'s boilerplate to load()/reload()? > > Backward Compatibility > ---------------------- > > Since ``Finder.find_module()`` methods would now return a module spec > instead of loader, specs must act like the loader that would have been > returned instead. This is relatively simple to solve since the loader > is available as an attribute of the spec. We will use ``__getattr__()`` > to do it. > > However, ``ModuleSpec.is_package`` (an attribute) conflicts with > ``InspectLoader.is_package()`` (a method). Working around this requires > a more complicated solution but is not a large obstacle. Simply making > ``ModuleSpec.is_package`` a method does not reflect that is a relatively > static piece of data. > Maybe, but depending on what your "more complicated solution" it it might be best to just give up the purity and go with the practicality. > ``module_repr()`` also conflicts with the same > method on loaders, but that workaround is not complicated since both are > methods. > > Unfortunately, the ability to proxy does not extend to ``id()`` > comparisons and ``isinstance()`` tests. In the case of the return value > of ``find_module()``, we accept that break in backward compatibility. > However, we will mitigate the problem with ``isinstance()`` somewhat by > registering ``ModuleSpec`` on the loaders in ``importlib.abc``. > Actually, ModuleSpec doesn't even need to register; __instancecheck__ and __subclasscheck__ can just be defined and delegate by calling issubclass/isinstance on the loader as appropriate. > [SNIP] > > Loaders > ------- > > Loaders will have a new method, ``exec_module(module)``. Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. > > The ``load_module()`` of loaders will still work and be an active part > of the loader API. It is still useful for cases where the default > module creation/prepartion/cleanup is not appropriate for the loader. > > For example, the C API for extension modules only supports the full > control of ``load_module()``. As such, ``ExtensionFileLoader`` will not > implement ``exec_module()``. In the future it may be appropriate to > produce a second C API that would support an ``exec_module()`` > implementation for ``ExtensionFileLoader``. Such a change is outside > the scope of this PEP. > > A loader must have at least one of ``exec_module()`` and > ``load_module()`` defined. > "A load must define either ``exec_module()`` or ``load_module()``." -Brett [SNIP] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Aug 13 05:35:14 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 12 Aug 2013 21:35:14 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Sun, Aug 11, 2013 at 7:03 AM, Nick Coghlan wrote: > I think this is solid enough to be worth adding to the PEPs repo now. > Sounds good. > > On 9 August 2013 18:58, Eric Snow wrote: > > Here's an updated version of the PEP for ModuleSpec which addresses the > > feedback I've gotten. Thanks for the help. The big open question, to > me, > > is whether or not to have a separate reload() method. I'll be looking > into > > that when I get a chance. There's also the question of a path-based > > subclass, but I'm currently not convinced it's worth it. > > One piece of feedback from me (triggered by the C extension modules > discussion on python-dev): we should consider proposing a new "exec" > hook for C extension modules that could be defined instead of or in > addition to the existing PEP 3121 init hook. > Sounds good. I expect you mean as a separate proposal... > Also, to handle the extension module case, we may need to let loaders > define an optional "create_module" method that accepts the MethodSpec > object as an argument. I'd considered that here, whether on the loader or on ModuleSpec. My plan was to hold off on that to stay focused on the rest of the changes. However, I'm open to adding this to the PEP. > > A High-Level View > > ----------------- > > > > ... > > Not sure a high level view is needed, but you can fill this in if you want > :) > Forgot that was in there. :) > > > > ModuleSpec > > ---------- > > > > A new class which defines the import-related values to use when loading > > the module. It closely corresponds to the import-related attributes of > > module objects. ``ModuleSpec`` objects may also be used by finders and > > loaders and other import-related APIs to hold extra import-related > > state about the module. This greatly reduces the need to add any new > > new import-related attributes to module objects, and loader ``__init__`` > > methods won't need to accommodate such per-module state. > > To avoid conflicts as the spec attributes evolve in the future, would > it be worth having a "custom" field which is just an arbitrary object > reference used to pass info from the finder to the loader without > troubling the rest of the import system? > I see what you're saying, but am conflicted. For some reason providing a sub-namespace for that doesn't seem quite right. However, the alternative runs the risk of collisions later on. Maybe we could recommend the use of a preceding "_" for custom attributes? I'll see if I can come up with something. > > The parameters have the same meaning as the attributes described below. > > However, not all ``ModuleSpec`` attributes are also parameters. > > The > > passed values are set as-is. For calculated values use the > > ``from_loader()`` method. > > This paragraph isn't particularly clear. Perhaps: > > "Passed in parameter values are assigned directly to the corresponding > attributes below. Other attributes not listed as parameters (such as > ``package``) are read-only properties that are automatically derived > from these values. > > The ``ModuleSpec.from_loader()`` class method allows a suitable > ModuleSpec instance to be easily created from a PEP 302 loader object" > That's much better. > > While ``package`` and ``is_package`` are read-only properties, the > > remaining attributes can be replaced after the module spec is created > > and after import is complete. This allows for unusual cases where > > modifying the spec is the best option. However, typical use should not > > involve changing the state of a module's spec. > > I'm with Brett that "is_package" should go, to be replaced by > "spec.path is not None" wherever it matters. is_package() would then > fall through to the PEP 302 loader API via __getattr__. > I'm considering the recommendation, but I still feel like `is_package` as an attribute is worth having. I see module.__spec__ as useful to more than the import system and its hackers, and `is_package` as a value to the broader audience that may not have learned about what __path__ means. It's certainly not obvious that __path__ implies a package. Then again, a person would have to be looking at __spec__ to see `is_package`, so maybe it loses enough utility to be worth keeping. > ``origin`` > > > > A string for the location from which the module originates. If > > ``filename`` is set, ``origin`` should be set to the same value unless > > some other value is more appropriate. ``origin`` is used in > > ``module_repr()`` if it does not match the value of ``filename``. > > > > Using ``filename`` for this meaning would be inaccurate, since not all > > modules have path-based locations. For instance, built-in modules do > > not have ``__file__`` set. Yet it is useful to have a descriptive > > string indicating that it originated from the interpreter as a built-in > > module. So built-in modules will have ``origin`` set to ``"built-in"``. > > How about we *just* have origin, with a separate "set_fileattr" > attribute to indicate "this is a discrete file, you should set > __file__"? > I like that. I'll see how it works. There doesn't seem to be any reason why you would have two distinct strings for origin and filename. In fact, that's kind of smelly. However, I wonder if this is where a PathModuleSpec subclass would be meaningful. Then no flag would be necessary. > Also, we should explicitly note that we'll still set __file__ for zip > imports, due to backwards compatibility concerns, even though it > doesn't correspond to a valid filesystem path. > Hmm. So deprecate the use of __file__ for anything but actual file names? Interesting. I was planning on just leaving the current meaning of "location relative to a path entry". > > (Random thought: spec.origin + spec.cached + a cache directory setting > in zipimport would give a potentially clean way to do extension module > imports from zip archives) > That would be cool. > > ``path`` > > > > The list of path entries in which to search for submodules if this > > module is a package. Otherwise it is ``None``. > > Path entries don't have to correspond to filesystem locations - they > just have to make sense to at least one path hook > (e.g. a DB URI would be a valid path entry). > Right. I didn't mean to imply that they do. > > .. XXX add a path-based subclass? > > Nope :) > I keep vacillating on this. > > ModuleSpec Methods > > ------------------ > > > > ``from_loader(name, loader, *, is_package=None, origin=None, > filename=None, > > cached=None, path=None)`` > > > > .. XXX use a different name? > > I'd disallow customisation on this one - if people want to customise, > they should just query the PEP 302 APIs themselves and call the > ModuleSpec constructor directly. The use case for this one should be > to make it trivial to switch from "return loader" to "return > ModuleSpec.from_loader(loader)" in a find_module implementation. > What do you mean by disallow customization? Make it "private"? `from_loader()` is intended for exactly the use that you described. > > In contrast to ``ModuleSpec.__init__()``, which takes the arguments > > as-is, ``from_loader()`` calculates missing values from the ones passed > > in, as much as possible. This replaces the behavior that is currently > > provided the several ``importlib.util`` functions as well as the > > optional ``init_module_attrs()`` method of loaders. Just to be clear, > > here is a more detailed description of those calculations:: > > > > If not passed in, ``filename`` is to the result of calling the > > loader's ``get_filename()``, if available. Otherwise it stays > > unset (``None``). > > > > If not passed in, ``path`` is set to an empty list if > > ``is_package`` is true. Then the directory from ``filename`` is > > appended to it, if possible. If ``is_package`` is false, ``path`` > > stays unset. > > > > If ``cached`` is not passed in and ``filename`` is passed in, > > ``cached`` is derived from it. For filenames with a source suffix, > > it set to the result of calling > > ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. > > ``.pyc``), ``cached`` is set to the value of ``filename``. If > > ``filename`` is not passed in or ``cache_from_source()`` raises > > ``NotImplementedError``, ``cached`` stays unset. > > > > If not passed in, ``origin`` is set to ``filename``. Thus if > > ``filename`` is unset, ``origin`` stays unset. > > Hmm, is there a reason this can't be the default constructor > behaviour? What's the value of *not* having the sensible fallbacks, > given they can always be overridden by passing in explicit values when > you want something different? > I'll think about this. There was some value in it before, but with changes to other signatures, `from_loader()` is much less useful as a separate factory method. > > A separate "from_module(m)" constructor would probably make sense, though. > I have this for internal use in the implementation, but did not expose it since all modules should already have a spec. > ``module_repr()`` > > > > Returns a repr string for the module if ``origin`` is set and > > ``filename`` is not set. The string refers to the value of ``origin``. > > Otherwise ``module_repr()`` returns None. This indicates to the module > > type's ``__repr__()`` that it should fall back to the default repr. > > > > We could also have ``module_repr()`` produce the repr for the case where > > ``filename`` is set or where ``origin`` is not set, mirroring the repr > > that the module type produces directly. However, the repr string is > > derived from the import-related module attributes, which might be out of > > sync with the spec. > > > > .. XXX Is using the spec close enough? Probably not. > > I think it makes sense to always return the expected repr based on the > spec attributes, but allow a custom origin to be passed in to handle > the case where the module __file__ attribute differs from > __spec__.origin (keeping in mind I think __spec__.filename should be > replaced with __spec__.set_fileattr) > That's the approach that I took at first, but the module that is passed in is not guaranteed to be a spec. Furthermore, having the spec take precedence over the module's attrs for the repr seems like too big a backward-compatibility risk. > > > The implementation of the module type's ``__repr__()`` will change to > > accommodate this PEP. However, the current functionality will remain to > > handle the case where a module does not have a ``__spec__`` attribute. > > Experience tells us that the import system should ensure the __spec__ > attribute always exists (even if it has to be filled in from the > module attributes after calling load_module) > That's a good point. The only possible problem is for someone that creates their own module object and expects repr to work the same as it does currently. > ``load(module=None, *, is_reload=False)`` > > Yep, definitely needs to be a separate method. "is_reload" would > almost always be set to a boolean, which means a separate API is > likely to be better. > Agreed. > However, I think the separate method should be "exec()" rather than > "reload()" and require that the module always be passed in. > I'll see how that looks. It seems like a better fit than just plain `reload()`. We could also expose a "create" method that just creates and returns > the new module object, and replace importlib.util.module_to_load with > a context manager that accepted the module as a parameter. Say > "add_to_sys", which fails if the module is already present in > sys.modules. > One of the points of ModuleSpec is to remove the need for `module_to_load()`. I'm not convinced of the utility of a create method like you've described other than possibly as something internal to ModuleSpec. load() would then look something like: > > def load(self): > m = self.create() > with importlib.util.add_to_sys(m): > self.exec(m) > return sys.modules[self.name] > > We could also provide reload() if we wanted to: > > def reload(self): > self.exec(sys.modules[self.name]) > return sys.modules[self.name] > > > Subclassing > > ----------- > > > > Subclasses of ModuleSpec are allowed, but should not be necessary. > > Adding functionality to a custom finder or loader will likely be a > > better fit and should be tried first. However, as long as a subclass > > still fulfills the requirements of the import system, objects of that > > type are completely fine as the return value of ``find_module()``. > > We may need to do subclasses for the ABC registration backwards > compatibility hack. > I was thinking of registering ModuleSpec in the setter of a `loader > > > > > Module Objects > > -------------- > > > > Module objects will now have a ``__spec__`` attribute to which the > > module's spec will be bound. None of the other import-related module > > attributes will be changed or deprecated, though some of them could be; > > any such deprecation can wait until Python 4. > > > > ``ModuleSpec`` objects will not be kept in sync with the corresponding > > module object's import-related attributes. Though they may differ, in > > practice they will typically be the same. > > Worth mentioning that __main__.__spec__.name will give the real name > of module's executed with -m here rather than delaying that until the > notes at the end. > > > Finders > > ------- > > > > Finders will now return ModuleSpec objects when ``find_module()`` is > > called rather than loaders. For backward compatility, ``Modulespec`` > > objects proxy the attributes of their ``loader`` attribute. > > > > Adding another similar method to avoid backward-compatibility issues > > is undersireable if avoidable. The import APIs have suffered enough, > > especially considering ``PathEntryFinder.find_loader()`` was just > > added in Python 3.3. The approach taken by this PEP should be > > sufficient to address backward-compatibility issues for > > ``find_module()``. > > > > The change to ``find_module()`` applies to both ``MetaPathFinder`` and > > ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be > > deprecated and, for backward compatibility, implicitly special-cased if > > the method exists on a finder. > > Actually, we don't currently have anything on ModuleSpec to indicate > "this is complete, stop scanning for more path fragments" or how we > will compose multiple module specs for the individual fragments into a > combined spec for the namespace package. > > > Finders are still responsible for creating the loader. That loader will > > now be stored in the module spec returned by ``find_module()`` rather > > than returned directly. As is currently the case without the PEP, if a > > loader would be costly to create, that loader can be designed to defer > > the cost until later. > > > > Loaders > > ------- > > > > Loaders will have a new method, ``exec_module(module)``. Its only job > > is to "exec" the module and consequently populate the module's > > namespace. It is not responsible for creating or preparing the module > > object, nor for any cleanup afterward. It has no return value. > > > > The ``load_module()`` of loaders will still work and be an active part > > of the loader API. It is still useful for cases where the default > > module creation/prepartion/cleanup is not appropriate for the loader. > > > > For example, the C API for extension modules only supports the full > > control of ``load_module()``. As such, ``ExtensionFileLoader`` will not > > implement ``exec_module()``. In the future it may be appropriate to > > produce a second C API that would support an ``exec_module()`` > > implementation for ``ExtensionFileLoader``. Such a change is outside > > the scope of this PEP. > > As above, I think it may worth tackling this. It shouldn't be *that* > hard given the higher level changes and will solve some hard problems > at the lower level. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Aug 13 05:47:27 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 12 Aug 2013 21:47:27 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: Accidently sent. :P Continuing... > On Sun, Aug 11, 2013 at 7:03 AM, Nick Coghlan wrote: > >> > Subclassing >> > > ----------- >> > >> > Subclasses of ModuleSpec are allowed, but should not be necessary. >> > Adding functionality to a custom finder or loader will likely be a >> > better fit and should be tried first. However, as long as a subclass >> > still fulfills the requirements of the import system, objects of that >> > type are completely fine as the return value of ``find_module()``. >> >> We may need to do subclasses for the ABC registration backwards >> compatibility hack. > > I was thinking of registering ModuleSpec in the setter of a `loader` property (as long as the loader's class has a `register()` method >> > >> > Module Objects >> > -------------- >> > >> > Module objects will now have a ``__spec__`` attribute to which the >> > module's spec will be bound. None of the other import-related module >> > attributes will be changed or deprecated, though some of them could be; >> > any such deprecation can wait until Python 4. >> > >> > ``ModuleSpec`` objects will not be kept in sync with the corresponding >> > module object's import-related attributes. Though they may differ, in >> > practice they will typically be the same. >> >> Worth mentioning that __main__.__spec__.name will give the real name >> of module's executed with -m here rather than delaying that until the >> notes at the end. >> > Fair enough. > >> > Finders >> > ------- >> > >> > Finders will now return ModuleSpec objects when ``find_module()`` is >> > called rather than loaders. For backward compatility, ``Modulespec`` >> > objects proxy the attributes of their ``loader`` attribute. >> > >> > Adding another similar method to avoid backward-compatibility issues >> > is undersireable if avoidable. The import APIs have suffered enough, >> > especially considering ``PathEntryFinder.find_loader()`` was just >> > added in Python 3.3. The approach taken by this PEP should be >> > sufficient to address backward-compatibility issues for >> > ``find_module()``. >> > >> > The change to ``find_module()`` applies to both ``MetaPathFinder`` and >> > ``PathEntryFinder``. ``PathEntryFinder.find_loader()`` will be >> > deprecated and, for backward compatibility, implicitly special-cased if >> > the method exists on a finder. >> >> Actually, we don't currently have anything on ModuleSpec to indicate >> "this is complete, stop scanning for more path fragments" or how we >> will compose multiple module specs for the individual fragments into a >> combined spec for the namespace package. >> > I was planning on just using the loader's type. If it's NamespaceLoader then path is where we'll get the fragments. I was going to say it's working in my implementation, but namespace packages are actually the one part that still have some failing tests. :P > >> > Finders are still responsible for creating the loader. That loader will >> > now be stored in the module spec returned by ``find_module()`` rather >> > than returned directly. As is currently the case without the PEP, if a >> > loader would be costly to create, that loader can be designed to defer >> > the cost until later. >> > >> > Loaders >> > ------- >> > >> > Loaders will have a new method, ``exec_module(module)``. Its only job >> > is to "exec" the module and consequently populate the module's >> > namespace. It is not responsible for creating or preparing the module >> > object, nor for any cleanup afterward. It has no return value. >> > >> > The ``load_module()`` of loaders will still work and be an active part >> > of the loader API. It is still useful for cases where the default >> > module creation/prepartion/cleanup is not appropriate for the loader. >> > >> > For example, the C API for extension modules only supports the full >> > control of ``load_module()``. As such, ``ExtensionFileLoader`` will not >> > implement ``exec_module()``. In the future it may be appropriate to >> > produce a second C API that would support an ``exec_module()`` >> > implementation for ``ExtensionFileLoader``. Such a change is outside >> > the scope of this PEP. >> >> As above, I think it may worth tackling this. It shouldn't be *that* >> hard given the higher level changes and will solve some hard problems >> at the lower level. >> > For me that seems like a separate proposal. Certainly it's related, but in some ways it would feel tacked on. On top of that, I'd have to dive into the extension module API much more than I have and I'd rather get ModuleSpec and .ref file wrapped up sooner. At the same time, I haven't really done much API design in C so that would be interesting. In the end, I'd like to keep the extension module API additions out of this PEP. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Aug 13 06:17:58 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 12 Aug 2013 22:17:58 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Sun, Aug 11, 2013 at 2:08 PM, Brett Cannon wrote: > On Fri, Aug 9, 2013 at 6:58 PM, Eric Snow wrote: > >> >> ``is_package`` >> >> Whether or not the module is a package. This dynamic attribute is True >> if ``path`` is set (even if empty), else it is false. >> > > "is True if ``path`` is not None (e.g. the empty list is a "true" value), > else it is False". > Thanks. That is clearer. > > >> >> ``origin`` >> >> A string for the location from which the module originates. If >> ``filename`` is set, ``origin`` should be set to the same value unless >> some other value is more appropriate. ``origin`` is used in >> ``module_repr()`` if it does not match the value of ``filename``. >> >> Using ``filename`` for this meaning would be inaccurate, since not all >> modules have path-based locations. For instance, built-in modules do >> not have ``__file__`` set. Yet it is useful to have a descriptive >> string indicating that it originated from the interpreter as a built-in >> module. So built-in modules will have ``origin`` set to ``"built-in"``. >> > > I still don't know what you would put there for a zipfile-based loader. > Would you still put __file__ or would you put the zipfile? I ask because I > would want a way to pass along in a zipfile finder to the loader where the > zipfile is located and then the internal location of the file. Otherwise > you need to pass in the zip path separately from the internal path to the > loader constructor instead of simply passing in a ModuleSpec (e.g. see > _split_path in http://bugs.python.org/file30660/zip_importlib.diff). > For me origin makes the most sense as the "string for the location from which the module originates". I'd think it would be the same as gets put into __file__ right now. However, you're right that there's more useful info that could be stored on the spec. In this case I'd expect it to be added as an extra attribute on the spec rather than as part of the normal ModuleSpec attributes. However, as Nick pointed out, custom attributes currently don't have a good strategy for avoiding collisions with future normal ModuleSpec attributes. > ``path`` >> >> The list of path entries in which to search for submodules if this >> module is a package. Otherwise it is ``None``. >> >> .. XXX add a path-based subclass? >> > > You mean like namespace package's __path__ object? Or are you saying you > want ModuleSpec vs. PackageSpec? > More like ModuleSpec and PathModuleSpec. PathModuleSpec would have filename, cached, and path (an associated handling), while ModuleSpec would not. At the same time I like having a one-size-fits-all ModuleSpec if possible, since it should probably pretty closely follow the one-size-fits-all module type. > > >> >> ModuleSpec Methods >> ------------------ >> >> ``from_loader(name, loader, *, is_package=None, origin=None, >> filename=None, cached=None, path=None)`` >> >> .. XXX use a different name? >> >> A factory classmethod that returns a new ``ModuleSpec`` derived from the >> arguments. ``is_package`` is used inside the method to indicate that >> the module is a package. >> > > Why is this parameter instead of the other than inferring from 'path' or > loader.is_package() as you fall back on? What's the motivation? > In part it's intended to lower the barrier to entry for people learning about the import system and getting their hands dirty. It's just more obvious as an explicit parameter. Of course, it means there are two parameters that basically accomplish the same thing, so perhaps it's not worth it. Furthermore, `from_loader()` may go the way of the dodo since the motivation for it has mostly gone away with other API changes. > Just to be clear, >> here is a more detailed description of those calculations:: >> >> If not passed in, ``filename`` is to the result of calling the >> loader's ``get_filename()``, if available. Otherwise it stays >> unset (``None``). >> >> If not passed in, ``path`` is set to an empty list if >> ``is_package`` is true. Then the directory from ``filename`` is >> appended to it, if possible. If ``is_package`` is false, ``path`` >> stays unset. >> >> If ``cached`` is not passed in and ``filename`` is passed in, >> ``cached`` is derived from it. For filenames with a source suffix, >> it set to the result of calling >> ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. >> ``.pyc``), ``cached`` is set to the value of ``filename``. If >> ``filename`` is not passed in or ``cache_from_source()`` raises >> ``NotImplementedError``, ``cached`` stays unset. >> >> If not passed in, ``origin`` is set to ``filename``. Thus if >> ``filename`` is unset, ``origin`` stays unset. >> > > Why is this a static constructor instead of a method like infer_values() > or an infer_values keyword-only argument to the constructor to do this if > requested? > Good point. I was already planning on yanking `from_loader()`. That kw-only argument would probably be a good fit. I'll try it out. > > >> >> ``module_repr()`` >> >> Returns a repr string for the module if ``origin`` is set and >> ``filename`` is not set. The string refers to the value of ``origin``. >> Otherwise ``module_repr()`` returns None. This indicates to the module >> type's ``__repr__()`` that it should fall back to the default repr. >> > > This makes me think that origin is an odd name if all it affects is > module_repr(). > It's also informational, of course. > > >> >> We could also have ``module_repr()`` produce the repr for the case where >> ``filename`` is set or where ``origin`` is not set, mirroring the repr >> that the module type produces directly. However, the repr string is >> derived from the import-related module attributes, which might be out of >> sync with the spec. >> > > > [SNIP] > > >> .. XXX add reload(module=None) and drop load()'s parameters entirely? >> > > If you are going to make these semantics of making the module argument > only good for reloading then I say yes, make it a separate method. > Yeah, I think it's settled. I like Nick's suggestion of calling it `exec()`. > >> .. XXX add more of importlib.reload()'s boilerplate to load()/reload()? >> >> Backward Compatibility >> ---------------------- >> >> Since ``Finder.find_module()`` methods would now return a module spec >> instead of loader, specs must act like the loader that would have been >> returned instead. This is relatively simple to solve since the loader >> is available as an attribute of the spec. We will use ``__getattr__()`` >> to do it. >> >> However, ``ModuleSpec.is_package`` (an attribute) conflicts with >> ``InspectLoader.is_package()`` (a method). Working around this requires >> a more complicated solution but is not a large obstacle. Simply making >> ``ModuleSpec.is_package`` a method does not reflect that is a relatively >> static piece of data. >> > > Maybe, but depending on what your "more complicated solution" it it might > be best to just give up the purity and go with the practicality. > It's not that complicated, but not exactly pretty: class _TruthyFunction: def __init__(self, func, is_true): self.func = func self._is_true = bool(is_true) def __repr__(self): return repr(self._is_true) def __bool__(self): return self._is_true def __call__(self, *args, **kwargs): return self.func(*args, **kwargs) class ModuleSpec: ... @property def is_package(self): loader = self.loader is_package = False if self.path is not None: is_package = True elif hasattr(self.loader, 'is_package'): try: is_package = loader.is_package(self.name) except ImportError: pass # Since InspectLoader also has is_package(), we have to # accommodate the use of the return value as a function. def func(*args, **kwargs): # XXX Throw a DeprecationWarning here? return self.loader.is_package(*args, **kwargs) return _TruthyFunction(func, is_package) > >> ``module_repr()`` also conflicts with the same >> method on loaders, but that workaround is not complicated since both are >> methods. >> >> Unfortunately, the ability to proxy does not extend to ``id()`` >> comparisons and ``isinstance()`` tests. In the case of the return value >> of ``find_module()``, we accept that break in backward compatibility. >> However, we will mitigate the problem with ``isinstance()`` somewhat by >> registering ``ModuleSpec`` on the loaders in ``importlib.abc``. >> > > Actually, ModuleSpec doesn't even need to register; __instancecheck__ and > __subclasscheck__ can just be defined and delegate by calling > issubclass/isinstance on the loader as appropriate. > Do you mean add custom versions of those methods to importlib.abc.Loader? That should work as well as the register approach. It won't work for all loaders but should be good enough. I was just planning on registering ModuleSpec on the loader in the setter for a `loader` property on ModuleSpec. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Aug 13 15:21:42 2013 From: brett at python.org (Brett Cannon) Date: Tue, 13 Aug 2013 09:21:42 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Tue, Aug 13, 2013 at 12:17 AM, Eric Snow wrote: > On Sun, Aug 11, 2013 at 2:08 PM, Brett Cannon wrote: > >> >> [SNIP] > >> >>> ``module_repr()`` also conflicts with the same >>> method on loaders, but that workaround is not complicated since both are >>> methods. >>> >>> Unfortunately, the ability to proxy does not extend to ``id()`` >>> comparisons and ``isinstance()`` tests. In the case of the return value >>> of ``find_module()``, we accept that break in backward compatibility. >>> However, we will mitigate the problem with ``isinstance()`` somewhat by >>> registering ``ModuleSpec`` on the loaders in ``importlib.abc``. >>> >> >> Actually, ModuleSpec doesn't even need to register; __instancecheck__ and >> __subclasscheck__ can just be defined and delegate by calling >> issubclass/isinstance on the loader as appropriate. >> > > Do you mean add custom versions of those methods to importlib.abc.Loader? > Nope, I meant ModuleSpec because every time I have a reason to override something it's on the object and not the class and so I forget the support is the other way around. Argh. > That should work as well as the register approach. It won't work for all > loaders but should be good enough. I was just planning on registering > ModuleSpec on the loader in the setter for a `loader` property on > ModuleSpec. > But the registration is at the class level so how would that work? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 14 01:47:53 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 13 Aug 2013 17:47:53 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Tue, Aug 13, 2013 at 7:21 AM, Brett Cannon wrote: > On Tue, Aug 13, 2013 at 12:17 AM, Eric Snow wrote: > >> On Sun, Aug 11, 2013 at 2:08 PM, Brett Cannon wrote: >> >>> >>> > [SNIP] > > >> >>> >>>> ``module_repr()`` also conflicts with the same >>>> method on loaders, but that workaround is not complicated since both are >>>> methods. >>>> >>>> Unfortunately, the ability to proxy does not extend to ``id()`` >>>> comparisons and ``isinstance()`` tests. In the case of the return value >>>> of ``find_module()``, we accept that break in backward compatibility. >>>> However, we will mitigate the problem with ``isinstance()`` somewhat by >>>> registering ``ModuleSpec`` on the loaders in ``importlib.abc``. >>>> >>> >>> Actually, ModuleSpec doesn't even need to register; __instancecheck__ >>> and __subclasscheck__ can just be defined and delegate by calling >>> issubclass/isinstance on the loader as appropriate. >>> >> >> Do you mean add custom versions of those methods to importlib.abc.Loader? >> > > Nope, I meant ModuleSpec because every time I have a reason to override > something it's on the object and not the class and so I forget the support > is the other way around. Argh. > Yeah, that would make things a lot easier. > That should work as well as the register approach. It won't work for all >> loaders but should be good enough. I was just planning on registering >> ModuleSpec on the loader in the setter for a `loader` property on >> ModuleSpec. >> > > But the registration is at the class level so how would that work? > @property def loader(self): return self._loader @loader.setter def loader(self, loader): try: register = loader.__class__.register except AttributeError: pass else: register(self.__class__) self._loader = loader It's not pretty and it won't work on non-ABCs, but it's better than nothing. The likelihood of someone doing an isinstance check on a loader seems pretty low though. Of course, I'm planning on doing just that for handling of namespace packages, but that's a little different. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 14 03:16:07 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 13 Aug 2013 21:16:07 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On 13 Aug 2013 18:48, "Eric Snow" wrote: > > On Tue, Aug 13, 2013 at 7:21 AM, Brett Cannon wrote: >> >> On Tue, Aug 13, 2013 at 12:17 AM, Eric Snow wrote: >>> >>> On Sun, Aug 11, 2013 at 2:08 PM, Brett Cannon wrote: >>>> >>>> >> >> [SNIP] >> >>>> >>>> >>>>> >>>>> ``module_repr()`` also conflicts with the same >>>>> method on loaders, but that workaround is not complicated since both are >>>>> methods. >>>>> >>>>> Unfortunately, the ability to proxy does not extend to ``id()`` >>>>> comparisons and ``isinstance()`` tests. In the case of the return value >>>>> of ``find_module()``, we accept that break in backward compatibility. >>>>> However, we will mitigate the problem with ``isinstance()`` somewhat by >>>>> registering ``ModuleSpec`` on the loaders in ``importlib.abc``. >>>> >>>> >>>> Actually, ModuleSpec doesn't even need to register; __instancecheck__ and __subclasscheck__ can just be defined and delegate by calling issubclass/isinstance on the loader as appropriate. >>> >>> >>> Do you mean add custom versions of those methods to importlib.abc.Loader? >> >> >> Nope, I meant ModuleSpec because every time I have a reason to override something it's on the object and not the class and so I forget the support is the other way around. Argh. > > > Yeah, that would make things a lot easier. > >>> >>> That should work as well as the register approach. It won't work for all loaders but should be good enough. I was just planning on registering ModuleSpec on the loader in the setter for a `loader` property on ModuleSpec. >> >> >> But the registration is at the class level so how would that work? > > > @property > def loader(self): > return self._loader > > @loader.setter > def loader(self, loader): > try: > register = loader.__class__.register > except AttributeError: > pass > else: > register(self.__class__) > self._loader = loader > > It's not pretty and it won't work on non-ABCs, but it's better than nothing. The likelihood of someone doing an isinstance check on a loader seems pretty low though. Of course, I'm planning on doing just that for handling of namespace packages, but that's a little different. That ends up registering ModuleSpec as an example of every loader ABC, so it doesn't work at all. Making the importlib ABC hooks ModuleSpec aware (so they knew to check the loader, not the spec) would be pretty easy, though. > > -eric > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 14 05:18:35 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 13 Aug 2013 21:18:35 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Tue, Aug 13, 2013 at 7:16 PM, Nick Coghlan wrote: > On 13 Aug 2013 18:48, "Eric Snow" wrote: > > @property > > def loader(self): > > return self._loader > > > > @loader.setter > > def loader(self, loader): > > try: > > register = loader.__class__.register > > except AttributeError: > > pass > > else: > > register(self.__class__) > > self._loader = loader > > > > It's not pretty and it won't work on non-ABCs, but it's better than > nothing. The likelihood of someone doing an isinstance check on a loader > seems pretty low though. Of course, I'm planning on doing just that for > handling of namespace packages, but that's a little different. > > That ends up registering ModuleSpec as an example of every loader ABC, so > it doesn't work at all. > I guess it does amount to a cheap trick, allowing isinstance() checks to pass but not necessarily providing the appropriate APIs. > Making the importlib ABC hooks ModuleSpec aware (so they knew to check the > loader, not the spec) would be pretty easy, though. > That's what I thought Brett was recommending earlier. I was going to express hesitation at spreading backward-compatibility tendrils. However, your recommendation is probably a good idea on its own. Several of the collections ABCs do explicit API checks and they'd work well here too. I'll add this to the PEP. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Aug 15 02:27:40 2013 From: pje at telecommunity.com (PJ Eby) Date: Wed, 14 Aug 2013 20:27:40 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Fri, Aug 9, 2013 at 6:58 PM, Eric Snow wrote: > A High-Level View > ----------------- > > ... It would be really helpful if that high-level view were actually included, as I'm having a lot of trouble wrapping my head around the rest of the spec. For that matter, some introductory examples to contrast "before" and "after" for something that this changes would be really nice at about this point. > Path-based attributes: > > If any of these is set, it indicates that the module is path-based. For > reference, a path entry is a string for a location where the import > system will look for modules, e.g. the path entries in ``sys.path`` or a > package's ``__path__``). > What does "path-based" actually mean here? On the one hand, you're saying that a path entry is on sys.path or a __path__, but then we're using an attribute called "filename". Shouldn't it be called path_entry or subpath, or location or something, if it's not required to be a filename? The overlap between path = sys.path and path = filesystem path is way too confusing here. > .. XXX Would a different name be better? ``path_location``? Yeah, definitely something other than filename. ;-) It might also help to explain that some modules can be loaded by reference to a location, e.g. a filesystem path or a URL or something of the sort -- having the location lets you load the module, but in theory you could load that module under various names. In contrast, non-located modules can't be loaded in this fashion: modules created by a meta path loader (such as builtins), or modules dynamically created in code. For these, the name is the only way to access them, so they have an "origin" but not a "location". Also, bear in mind that it's not just exotic locations like URLs that aren't filenames. zipimport uses pseudo-filenames that pretend a zipfile is a directory, by prepending the zipfile's filename to a path that's within the zipfile. So, calling this "filename" is *really* a bad idea; it's not always a filename for even stdlib importers, let alone anything third-party! > ``path`` > > The list of path entries in which to search for submodules if this > module is a package. Otherwise it is ``None``. This should probably be called submodule_path or submodule_search_locations or something, to avoid even *more* overloading of the word "path". ;-) > .. XXX add a path-based subclass? Why? What good would it do? > ModuleSpec Methods > ------------------ > > ``from_loader(name, loader, *, is_package=None, origin=None, filename=None, > cached=None, path=None)`` > > .. XXX use a different name? Seems fine to me: it's consistent w/other stdlib factory method names. > If not passed in, ``path`` is set to an empty list if > ``is_package`` is true. Then the directory from ``filename`` is > appended to it, if possible. If ``is_package`` is false, ``path`` > stays unset. How does this interact with namespace packages? Does it? > Sets the module's import-related attributes to the corresponding values > in the module spec. If a path-based attribute is not set on the spec, Location-based? ;-) > ``load(module=None, *, is_reload=False)`` > > This method captures the current functionality of and requirements on > ``Loader.load_module()`` without any semantic changes, except one. > Reloading a module when ``exec_module()`` is available actually uses > ``module`` rather than ignoring it in favor of the one in > ``sys.modules``, as ``Loader.load_module()`` does. Interesting -- this could possibly be leveraged to implement multi-version imports. > ``module`` is only allowed when ``is_reload`` is true. ...or not. ;) > This means that > ``is_reload`` could be dropped as a parameter. However, doing so would > mean we could not use ``None`` to indicate that the module should be > pulled from ``sys.modules``. Wait, what? That doesn't seem true to me: why not just use the module or pull one according to whether it's None or not? What actual difference does is_reload really make here? > Regarding the first part of ``load()``, the following describes what > happens. I'm thinking maybe this should be parameterized to allow passing in a 'modules' dictionary other than sys.modules. This would make multi-version imports or other "isolated environment" imports more viable, and factor out another global element of the import system. That way, if you implement an isolated module system, you don't have to duplicate or subclass ModuleSpec to perform the same loading functionality. > Unfortunately, the ability to proxy does not extend to ``id()`` > comparisons and ``isinstance()`` tests. Who does id() tests on loaders? isinstance() fudging, OTOH, is quite doable. See the ProxyTypes library on PyPI for an example; it's 2.x-only but I believe somebody has done a proof-of-concept port (due to some __special__ methods being different or missing in 3.x) > Finders > ------- > > Finders will now return ModuleSpec objects when ``find_module()`` is > called rather than loaders. For backward compatility, ``Modulespec`` > objects proxy the attributes of their ``loader`` attribute. Has anybody looked at how this change affects pkgutil's (and setuptools') generic function-based extensions to PEP 302? Currently, you can register specific loader types with these guys, but that'll likely break if importlib is going to start wrapping loaders without those tools' knowledge. May I suggest adding a new finder method, find_module_spec() instead? Then, implement it for finders that don't support it by calling find_module() and wrapping the loader with a ModuleSpec. This approach would be less disruptive to code that already uses find_module and inspects loader types to add extension protocols. > Adding another similar method to avoid backward-compatibility issues > is undersireable if avoidable. The import APIs have suffered enough, > especially considering ``PathEntryFinder.find_loader()`` was just > added in Python 3.3. The approach taken by this PEP should be > sufficient to address backward-compatibility issues for > ``find_module()``. I'm not sure I'm following here: are you saying that all PEP 302 finders implemented by anyone, anywhere, must be changed *in order to work at all*, when this lands in a *minor version change*? > Other Changes > ------------- This section doesn't address impact on pkgutil, which makes significant use of the PEP 302 API. From ericsnowcurrently at gmail.com Thu Aug 15 09:38:11 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 15 Aug 2013 01:38:11 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Wed, Aug 14, 2013 at 6:27 PM, PJ Eby wrote: > On Fri, Aug 9, 2013 at 6:58 PM, Eric Snow > wrote: > > A High-Level View > > ----------------- > > > > ... > > It would be really helpful if that high-level view were actually > included, as I'm having a lot of trouble wrapping my head around the > rest of the spec. For that matter, some introductory examples to > contrast "before" and "after" for something that this changes would be > really nice at about this point. > Sounds good. As to examples, do you mean how you would replace an implementation of load_module() with one of exec_module()? > > Path-based attributes: > > > > If any of these is set, it indicates that the module is path-based. For > > reference, a path entry is a string for a location where the import > > system will look for modules, e.g. the path entries in ``sys.path`` or a > > package's ``__path__``). > > > > What does "path-based" actually mean here? On the one hand, you're > saying that a path entry is on sys.path or a __path__, but then we're > using an attribute called "filename". Shouldn't it be called > path_entry or subpath, or location or something, if it's not required > to be a filename? The overlap between path = sys.path and path = > filesystem path is way too confusing here. > This is a really good point. I'll clean it up. I've already changed "path" to "path_entries" and dropped "filename" in favor of "set_fileattr". Furthermore, "file location" is a good substitute for "path" when talking about files. > > .. XXX Would a different name be better? ``path_location``? > > Yeah, definitely something other than filename. ;-) > > It might also help to explain that some modules can be loaded by > reference to a location, e.g. a filesystem path or a URL or something > of the sort -- having the location lets you load the module, but in > theory you could load that module under various names. In contrast, > non-located modules can't be loaded in this fashion: modules created > by a meta path loader (such as builtins), or modules dynamically > created in code. For these, the name is the only way to access them, > so they have an "origin" but not a "location". > Right. That's the point of "origin". It will be up to the loader whether or not to use "origin" to determine a location, if any. Also, bear in mind that it's not just exotic locations like URLs that > aren't filenames. zipimport uses pseudo-filenames that pretend a > zipfile is a directory, by prepending the zipfile's filename to a path > that's within the zipfile. So, calling this "filename" is *really* a > bad idea; it's not always a filename for even stdlib importers, let > alone anything third-party! > Yeah, that has always bugged me about "__file__". The upcoming revision of the PEP uses the combo of "origin" and "set_fileattr" (a bool) instead of "filename". > > ``path`` > > > > The list of path entries in which to search for submodules if this > > module is a package. Otherwise it is ``None``. > > This should probably be called submodule_path or > submodule_search_locations or something, to avoid even *more* > overloading of the word "path". ;-) > I came to the same conclusion and was planning on using "path_entries". However perhaps something even more explicit, like "submodule_search_locations", would be better. :) > If not passed in, ``path`` is set to an empty list if > > ``is_package`` is true. Then the directory from ``filename`` is > > appended to it, if possible. If ``is_package`` is false, ``path`` > > stays unset. > > How does this interact with namespace packages? Does it? > Namespace packages won't use this method, so nothing will be populated dynamically. > > ``load(module=None, *, is_reload=False)`` > > > > This method captures the current functionality of and requirements on > > ``Loader.load_module()`` without any semantic changes, except one. > > Reloading a module when ``exec_module()`` is available actually uses > > ``module`` rather than ignoring it in favor of the one in > > ``sys.modules``, as ``Loader.load_module()`` does. > > Interesting -- this could possibly be leveraged to implement > multi-version imports. > I'm planning on splitting reload() out from load() so those semantics would go away. However, there may be room to still provide the same functionality. What would be needed for multi-version imports? (Is that question opening a can of worms? ) > > This means that > > ``is_reload`` could be dropped as a parameter. However, doing so would > > mean we could not use ``None`` to indicate that the module should be > > pulled from ``sys.modules``. > > Wait, what? That doesn't seem true to me: why not just use the module > or pull one according to whether it's None or not? What actual > difference does is_reload really make here? > With a separate reload() this point is moot. > > Regarding the first part of ``load()``, the following describes what > > happens. > > I'm thinking maybe this should be parameterized to allow passing in a > 'modules' dictionary other than sys.modules. This would make > multi-version imports or other "isolated environment" imports more > viable, and factor out another global element of the import system. > That way, if you implement an isolated module system, you don't have > to duplicate or subclass ModuleSpec to perform the same loading > functionality. > Cool idea, but couldn't this wait. I could totally see this as part of PEP 406 (import engine). > > Unfortunately, the ability to proxy does not extend to ``id()`` > > comparisons and ``isinstance()`` tests. > > Who does id() tests on loaders? Which is why I'm not going to worry about it too much. :) > isinstance() fudging, OTOH, is quite > doable. See the ProxyTypes library on PyPI for an example; it's > 2.x-only but I believe somebody has done a proof-of-concept port (due > to some __special__ methods being different or missing in 3.x) > The current plan is to simply implement __subclasshook__() on the various importlib ABCs, and perhaps other loaders, to check for methods. Some of the ABCs in collections.abc (like Iterator) do this. > Finders > > ------- > > > > Finders will now return ModuleSpec objects when ``find_module()`` is > > called rather than loaders. For backward compatility, ``Modulespec`` > > objects proxy the attributes of their ``loader`` attribute. > > Has anybody looked at how this change affects pkgutil's (and > setuptools') generic function-based extensions to PEP 302? Currently, > you can register specific loader types with these guys, but that'll > likely break if importlib is going to start wrapping loaders without > those tools' knowledge. > Good point. I'll look into this. > > May I suggest adding a new finder method, find_module_spec() instead? > Then, implement it for finders that don't support it by calling > find_module() and wrapping the loader with a ModuleSpec. This > approach would be less disruptive to code that already uses > find_module and inspects loader types to add extension protocols. > I consider this a last resort--i.e. if we can't find a way to make find_module() work for us in a simple enough way. I just cringe at the idea of bolting on another backward-compatibility-induced method, particularly when it's the OOTDI and the existing name is better fit for the new functionality than old and re-purposing find_module() is within reach. > Adding another similar method to avoid backward-compatibility issues > > is undersireable if avoidable. The import APIs have suffered enough, > > especially considering ``PathEntryFinder.find_loader()`` was just > > added in Python 3.3. The approach taken by this PEP should be > > sufficient to address backward-compatibility issues for > > ``find_module()``. > > I'm not sure I'm following here: are you saying that all PEP 302 > finders implemented by anyone, anywhere, must be changed *in order to > work at all*, when this lands in a *minor version change*? > Existing finders and loaders will continue working as-is. I've already got this working in a rough implementation, so it's not that big a stretch. > > Other Changes > > ------------- > > This section doesn't address impact on pkgutil, which makes > significant use of the PEP 302 API. > I'll add that in. Thanks for bringing it up. My draft implementation is passing all the pkgutil tests, but I wouldn't be surprised if I've missed something here. Anyway, thanks for the feedback. I'll post an update to the PEP in the next day or two. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 15 17:23:45 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 Aug 2013 10:23:45 -0500 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On 15 August 2013 02:38, Eric Snow wrote: > PJE wrote: >> I'm thinking maybe this should be parameterized to allow passing in a >> 'modules' dictionary other than sys.modules. This would make >> multi-version imports or other "isolated environment" imports more >> viable, and factor out another global element of the import system. >> That way, if you implement an isolated module system, you don't have >> to duplicate or subclass ModuleSpec to perform the same loading >> functionality. > > Cool idea, but couldn't this wait. I could totally see this as part of PEP > 406 (import engine). One of the conclusions I came to from Greg's import engine work is that the only practical way for us to get to isolated import subsystems is either with a Decimal style thread local context based solution, or with a split create/exec API where the loader doesn't do any global state manipulation at all and instead operates in a functional mode where it just returns values based on passed in parameters (that way the import system at least has the chance to override __import__ before running the module code). Anything else looks like it will be too fragile (and the latter approach doesn't necessarily work for C extensions that do imports). This is part of why I'm keen on having this PEP expose "create" and "exec" as separate operations on ModuleSpec, with "load" acting solely as a convenience function for combining them with the appropriate sys.modules manipulation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Aug 15 17:59:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 Aug 2013 10:59:43 -0500 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On 12 Aug 2013 23:35, "Eric Snow" wrote: > > On Sun, Aug 11, 2013 at 7:03 AM, Nick Coghlan wrote: >> >> I think this is solid enough to be worth adding to the PEPs repo now. > > > Sounds good. > >> >> >> On 9 August 2013 18:58, Eric Snow wrote: >> > Here's an updated version of the PEP for ModuleSpec which addresses the >> > feedback I've gotten. Thanks for the help. The big open question, to me, >> > is whether or not to have a separate reload() method. I'll be looking into >> > that when I get a chance. There's also the question of a path-based >> > subclass, but I'm currently not convinced it's worth it. >> >> One piece of feedback from me (triggered by the C extension modules >> discussion on python-dev): we should consider proposing a new "exec" >> hook for C extension modules that could be defined instead of or in >> addition to the existing PEP 3121 init hook. > > > Sounds good. I expect you mean as a separate proposal... I actually meant in this proposal, but strictly speaking, I just need the "create" part of the API at this level to tie into my ideas for C extensions :) Now that this has a PEP number I can reference, I'll try to get something more fleshed out posted later this week. >> > ModuleSpec >> > ---------- >> > >> > A new class which defines the import-related values to use when loading >> > the module. It closely corresponds to the import-related attributes of >> > module objects. ``ModuleSpec`` objects may also be used by finders and >> > loaders and other import-related APIs to hold extra import-related >> > state about the module. This greatly reduces the need to add any new >> > new import-related attributes to module objects, and loader ``__init__`` >> > methods won't need to accommodate such per-module state. >> >> To avoid conflicts as the spec attributes evolve in the future, would >> it be worth having a "custom" field which is just an arbitrary object >> reference used to pass info from the finder to the loader without >> troubling the rest of the import system? > > > I see what you're saying, but am conflicted. For some reason providing a sub-namespace for that doesn't seem quite right. However, the alternative runs the risk of collisions later on. Maybe we could recommend the use of a preceding "_" for custom attributes? I'll see if I can come up with something. It wouldn't be a custom namespace, just a single attribute to pass data to the loader. It could be a dict, namespace, string, custom object, anything. By default, it would be None. For example, zipimporter could use it to pass the zip archive name to the loader directly, rather than needing to derive it from origin or create a custom loader for each find operation. >> > While ``package`` and ``is_package`` are read-only properties, the >> > remaining attributes can be replaced after the module spec is created >> > and after import is complete. This allows for unusual cases where >> > modifying the spec is the best option. However, typical use should not >> > involve changing the state of a module's spec. >> >> I'm with Brett that "is_package" should go, to be replaced by >> "spec.path is not None" wherever it matters. is_package() would then >> fall through to the PEP 302 loader API via __getattr__. > > > I'm considering the recommendation, but I still feel like `is_package` as an attribute is worth having. I see module.__spec__ as useful to more than the import system and its hackers, and `is_package` as a value to the broader audience that may not have learned about what __path__ means. It's certainly not obvious that __path__ implies a package. Then again, a person would have to be looking at __spec__ to see `is_package`, so maybe it loses enough utility to be worth keeping. I think we need to emphasise the fact that a package is just a module with a search path attribute *more* rather than less. Don't try to hide it, shout it from the rooftops :) Say, something like "spec.submodule_search_path is not None" :) >> How about we *just* have origin, with a separate "set_fileattr" >> attribute to indicate "this is a discrete file, you should set >> __file__"? > > > I like that. I'll see how it works. There doesn't seem to be any reason why you would have two distinct strings for origin and filename. In fact, that's kind of smelly. > > However, I wonder if this is where a PathModuleSpec subclass would be meaningful. Then no flag would be necessary. I realised we may not need a separate flag at all: how about we key this off "hasattr(self.loader, 'get_data')"? And expose that as a "is_location" read-only property? (I like PJE's suggestion of "location" as a name for modules which may be used with a loader that supports the get_data API) (Tangent: at some point in the future, we could define an "open" method on spec objects. This would do the path munging relative to origin automatically, using the opener argument to the builtin open to back it with BytesIO and the get_data API on the loader. If loaders defined an "opener" method, then the spec could use that instead) >> > ModuleSpec Methods >> > ------------------ >> > >> > ``from_loader(name, loader, *, is_package=None, origin=None, filename=None, >> > cached=None, path=None)`` >> > >> > .. XXX use a different name? >> >> I'd disallow customisation on this one - if people want to customise, >> they should just query the PEP 302 APIs themselves and call the >> ModuleSpec constructor directly. The use case for this one should be >> to make it trivial to switch from "return loader" to "return >> ModuleSpec.from_loader(loader)" in a find_module implementation. > > > What do you mean by disallow customization? Make it "private"? `from_loader()` is intended for exactly the use that you described. The keyword arguments. If from_loader stays, it shouldn't allow you to override the values derived from the loader - if you want to do that, just read the values you want to keep from the loader and pass them in explicitly. >> A separate "from_module(m)" constructor would probably make sense, though. > > I have this for internal use in the implementation, but did not expose it since all modules should already have a spec. It's more for the benefit of adapting existing loaders - since they already have the code to initialise the module, we should make it easy for them to just initialise a throwaway module and convert it to a spec object, rather than having to completely rewrite their initialisation code to be spec based. >> > ``module_repr()`` >> > >> > Returns a repr string for the module if ``origin`` is set and >> > ``filename`` is not set. The string refers to the value of ``origin``. >> > Otherwise ``module_repr()`` returns None. This indicates to the module >> > type's ``__repr__()`` that it should fall back to the default repr. >> > >> > We could also have ``module_repr()`` produce the repr for the case where >> > ``filename`` is set or where ``origin`` is not set, mirroring the repr >> > that the module type produces directly. However, the repr string is >> > derived from the import-related module attributes, which might be out of >> > sync with the spec. >> > >> > .. XXX Is using the spec close enough? Probably not. >> >> I think it makes sense to always return the expected repr based on the >> spec attributes, but allow a custom origin to be passed in to handle >> the case where the module __file__ attribute differs from >> __spec__.origin (keeping in mind I think __spec__.filename should be >> replaced with __spec__.set_fileattr) > > > That's the approach that I took at first, but the module that is passed in is not guaranteed to be a spec. Furthermore, having the spec take precedence over the module's attrs for the repr seems like too big a backward-compatibility risk. I don't understand your response. Simplifying the API a bit to allow a module to be passed in directly, ModuleType.__repr__ would just call it like this: self.__spec__.module_repr(self) All the logic would be in one place (ModuleSpec), but modules could still override the original values with the actual settings in the module namespace. >> > The implementation of the module type's ``__repr__()`` will change to >> > accommodate this PEP. However, the current functionality will remain to >> > handle the case where a module does not have a ``__spec__`` attribute. >> >> Experience tells us that the import system should ensure the __spec__ >> attribute always exists (even if it has to be filled in from the >> module attributes after calling load_module) > > That's a good point. The only possible problem is for someone that creates their own module object and expects repr to work the same as it does currently. Hmm, true - however, we can handle that by creating and throwing away a dummy spec object rather than duplicating the logic. >> We could also expose a "create" method that just creates and returns >> the new module object, and replace importlib.util.module_to_load with >> a context manager that accepted the module as a parameter. Say >> "add_to_sys", which fails if the module is already present in >> sys.modules. > > > One of the points of ModuleSpec is to remove the need for `module_to_load()`. I'm not convinced of the utility of a create method like you've described other than possibly as something internal to ModuleSpec. Splitting create and exec should eventually let me delete a bunch of code from runpy :) Cheers, Nick. From ericsnowcurrently at gmail.com Fri Aug 16 00:15:57 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 15 Aug 2013 16:15:57 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Thu, Aug 15, 2013 at 9:23 AM, Nick Coghlan wrote: > On 15 August 2013 02:38, Eric Snow wrote: > > PJE wrote: > >> I'm thinking maybe this should be parameterized to allow passing in a > >> 'modules' dictionary other than sys.modules. This would make > >> multi-version imports or other "isolated environment" imports more > >> viable, and factor out another global element of the import system. > >> That way, if you implement an isolated module system, you don't have > >> to duplicate or subclass ModuleSpec to perform the same loading > >> functionality. > > > > Cool idea, but couldn't this wait. I could totally see this as part of > PEP > > 406 (import engine). > > One of the conclusions I came to from Greg's import engine work is > that the only practical way for us to get to isolated import > subsystems is either with a Decimal style thread local context based > solution, I was messing around with this a while back and the thread-local context approach was pretty easy to do. > or with a split create/exec API where the loader doesn't do > any global state manipulation at all and instead operates in a > functional mode where it just returns values based on passed in > parameters (that way the import system at least has the chance to > override __import__ before running the module code). Anything else > looks like it will be too fragile (and the latter approach doesn't > necessarily work for C extensions that do imports). > > This is part of why I'm keen on having this PEP expose "create" and > "exec" as separate operations on ModuleSpec, with "load" acting solely > as a convenience function for combining them with the appropriate > sys.modules manipulation. > Ah. That helps clarify things. I'll got stew on that a bit. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Aug 24 13:50:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 24 Aug 2013 21:50:24 +1000 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On 16 August 2013 01:59, Nick Coghlan wrote: > It wouldn't be a custom namespace, just a single attribute to pass > data to the loader. It could be a dict, namespace, string, custom > object, anything. By default, it would be None. > > For example, zipimporter could use it to pass the zip archive name to > the loader directly, rather than needing to derive it from origin or > create a custom loader for each find operation. Having implemented the "exec" part of the C extension modifications (see http://mail.python.org/pipermail/python-dev/2013-August/128101.html), I'm more convinced than ever that ModuleSpec should have some kind of a subnamespace for storing arbitrary loader specific details. Providing such a storage location not only allows information to be passed from the finder to the loader, but also from the create step to the exec step in the loading process (the C extension loader would, of necessity, find the execution entry point while determining how to create the module. It makes sense to be able to store that somewhere on the spec object, rather than having to go searching through the exported symbols again in the execution step. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Wed Aug 28 10:53:19 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 02:53:19 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Sat, Aug 24, 2013 at 5:50 AM, Nick Coghlan wrote: > On 16 August 2013 01:59, Nick Coghlan wrote: > > It wouldn't be a custom namespace, just a single attribute to pass > > data to the loader. It could be a dict, namespace, string, custom > > object, anything. By default, it would be None. > > > > For example, zipimporter could use it to pass the zip archive name to > > the loader directly, rather than needing to derive it from origin or > > create a custom loader for each find operation. > > Having implemented the "exec" part of the C extension modifications > (see http://mail.python.org/pipermail/python-dev/2013-August/128101.html), > I'm more convinced than ever that ModuleSpec should have some kind of > a subnamespace for storing arbitrary loader specific details. > Providing such a storage location not only allows information to be > passed from the finder to the loader, but also from the create step to > the exec step in the loading process (the C extension loader would, of > necessity, find the execution entry point while determining how to > create the module. It makes sense to be able to store that somewhere > on the spec object, rather than having to go searching through the > exported symbols again in the execution step. Okay, I'm sold. For now I'm calling it "loading_info", but that name sounds kind of lame. FYI, I have an update of the PEP up. I've posted it to this list so it may show up in a day or two. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 28 11:26:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 Aug 2013 19:26:43 +1000 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On 28 Aug 2013 18:53, "Eric Snow" wrote: > > > > > On Sat, Aug 24, 2013 at 5:50 AM, Nick Coghlan wrote: >> >> On 16 August 2013 01:59, Nick Coghlan wrote: >> > It wouldn't be a custom namespace, just a single attribute to pass >> > data to the loader. It could be a dict, namespace, string, custom >> > object, anything. By default, it would be None. >> > >> > For example, zipimporter could use it to pass the zip archive name to >> > the loader directly, rather than needing to derive it from origin or >> > create a custom loader for each find operation. >> >> Having implemented the "exec" part of the C extension modifications >> (see http://mail.python.org/pipermail/python-dev/2013-August/128101.html ), >> I'm more convinced than ever that ModuleSpec should have some kind of >> a subnamespace for storing arbitrary loader specific details. >> Providing such a storage location not only allows information to be >> passed from the finder to the loader, but also from the create step to >> the exec step in the loading process (the C extension loader would, of >> necessity, find the execution entry point while determining how to >> create the module. It makes sense to be able to store that somewhere >> on the spec object, rather than having to go searching through the >> exported symbols again in the execution step. > > > Okay, I'm sold. For now I'm calling it "loading_info", but that name sounds kind of lame. I realised that if we're going to allow mutating the spec in create, we're going to have to promise not to reuse them across load calls. So loaders can be shared, but specs can't. > FYI, I have an update of the PEP up. I've posted it to this list so it may show up in a day or two. Heh :) Cheers, Nick. > > -eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Wed Aug 28 11:49:33 2013 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 28 Aug 2013 05:49:33 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: <521DC7AD.3050400@trueblade.com> On 8/28/2013 5:26 AM, Nick Coghlan wrote: > > On 28 Aug 2013 18:53, "Eric Snow" > wrote: > > >> FYI, I have an update of the PEP up. I've posted it to this list so > it may show up in a day or two. > > Heh :) No matter how big I make the message limit, the PEP seems to exceed it! I'll release it shortly. -- Eric. From ericsnowcurrently at gmail.com Wed Aug 28 10:50:55 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 02:50:55 -0600 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 Message-ID: I've incorporated the feedback into the PEP and gave up on trying to re-purpose Finder.find_module() (which wasn't worth it). Let me know what you think. I'll have the implementation up on http://bugs.python.org/issue18864 in the next couple days. -eric ---------------------------------------------------------------------------------------- PEP: 451 Title: A ModuleSpec Type for the Import System Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 8-Aug-2013 Python-Version: 3.4 Post-History: 8-Aug-2013 28-Aug-2013 Resolution: Abstract ======== This PEP proposes to add a new class to ``importlib.machinery`` called ``ModuleSpec``. It will be authoritative for all the import-related information about a module, and will be available without needing to load the module first. Finders will provide a module's spec instead of a loader. The import machinery will be adjusted to take advantage of module specs, including using them to load modules. Motivation ========== The import system has evolved over the lifetime of Python. In late 2002 PEP 302 introduced standardized import hooks via ``finders`` and ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibilty also presents a challenge. As more developers come to understand and customize the import system, any weaknesses in the finder and loader APIs will be more impactful. So the sooner we can address any such weaknesses the import system, the better...and there are a couple we can take care of with this proposal. Firstly, any time the import system needs to save information about a module we end up with more attributes on module objects that are generally only meaningful to the import system and occasionally to some people. It would be nice to have a per-module namespace to put future import-related information. Secondly, there's an API void between finders and loaders that causes undue complexity when encountered. Currently finders are strictly responsible for providing the loader which the import system will use to load the module. The loader is then responsible for doing some checks, creating the module object, setting import-related attributes, "installing" the module to ``sys.modules``, and loading the module, along with some cleanup. This all takes place during the import system's call to ``Loader.load_module()``. Loaders also provide some APIs for accessing data associated with a module. Loaders are not required to provide any of the functionality of ``load_module()`` through other methods. Thus, though the import- related information about a module is likely available without loading the module, it is not otherwise exposed. Furthermore, the requirements assocated with ``load_module()`` are common to all loaders and mostly are implemented in exactly the same way. This means every loader has to duplicate the same boilerplate code. ``importlib.util`` provides some tools that help with this, but it would be more helpful if the import system simply took charge of these responsibilities. The trouble is that this would limit the degree of customization that ``load_module()`` facilitates. This is a gap between finders and loaders which this proposal aims to fill. Finally, when the import system calls a finder's ``find_module()``, the finder makes use of a variety of information about the module that is useful outside the context of the method. Currently the options are limited for persisting that per-module information past the method call, since it only returns the loader. Popular options for this limitation are to store the information in a module-to-info mapping somewhere on the finder itself, or store it on the loader. Unfortunately, loaders are not required to be module-specific. On top of that, some of the useful information finders could provide is common to all finders, so ideally the import system could take care of that. This is the same gap as before between finders and loaders. As an example of complexity attributable to this flaw, the implementation of namespace packages in Python 3.3 (see PEP 420) added ``FileFinder.find_loader()`` because there was no good way for ``find_module()`` to provide the namespace search locations. The answer to this gap is a ``ModuleSpec`` object that contains the per-module information and takes care of the boilerplate functionality of loading the module. (The idea gained momentum during discussions related to another PEP.[1]) Specification ============= The goal is to address the gap between finders and loaders while changing as little of their semantics as possible. Though some functionality and information is moved to the new ``ModuleSpec`` type, their behavior should remain the same. However, for the sake of clarity the finder and loader semantics will be explicitly identified. This is a high-level summary of the changes described by this PEP. More detail is available in later sections. importlib.machinery.ModuleSpec (new) ------------------------------------ Attributes: * name - a string for the name of the module. * loader - the loader to use for loading and for module data. * origin - a string for the location from which the module is loaded. * submodule_search_locations - strings for where to find submodules, if a package. * loading_info - a container of data for use during loading (or None). * cached (property) - a string for where the compiled module will be stored. * is_location (RO-property) - the module's origin refers to a location. .. XXX Find a better name than loading_info? .. XXX Add ``submodules`` (RO-property) - returns possible submodules relative to spec (or None)? .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any? Factory Methods: * from_file_location() - factory for file-based module specs. * from_module() - factory based on import-related module attributes. * from_loader() - factory based on information provided by loaders. .. XXX Move the factories to importlib.util or make class-only? Instance Methods: * init_module_attrs() - populate a module's import-related attributes. * module_repr() - provide a repr string for a module. * create() - provide a new module to use for loading. * exec() - execute the spec into a module namespace. * load() - prepare a module and execute it in a protected way. * reload() - re-execute a module in a protected way. .. XXX Make module_repr() match the spec (BC problem?)? API Additions ------------- * ``importlib.abc.Loader.exec_module()`` will execute a module in its own namespace, replacing ``importlib.abc.Loader.load_module()``. * ``importlib.abc.Loader.create_module()`` (optional) will return a new module to use for loading. * Module objects will have a new attribute: ``__spec__``. * ``importlib.find_spec()`` will return the spec for a module. * ``__subclasshook__()`` will be implemented on the importlib ABCs. .. XXX Do __subclasshook__() separately from the PEP (issue18862). API Changes ----------- * Import-related module attributes will no longer be authoritative nor used by the import system. * ``InspectLoader.is_package()`` will become optional. .. XXX module __repr__() will prefer spec attributes? Deprecations ------------ * ``importlib.abc.MetaPathFinder.find_module()`` * ``importlib.abc.PathEntryFinder.find_module()`` * ``importlib.abc.PathEntryFinder.find_loader()`` * ``importlib.abc.Loader.load_module()`` * ``importlib.abc.Loader.module_repr()`` * The parameters and attributes of the various loaders in ``importlib.machinery`` * ``importlib.util.set_package()`` * ``importlib.util.set_loader()`` * ``importlib.find_loader()`` Removals -------- * ``importlib.abc.Loader.init_module_attrs()`` * ``importlib.util.module_to_load()`` Other Changes ------------- * The spec for the ``__main__`` module will reflect the appropriate name and origin. * The module type's ``__repr__`` will defer to ModuleSpec exclusively. Backward-Compatibility ---------------------- * If a finder does not define ``find_spec()``, a spec is derived from the loader returned by ``find_module()``. * ``PathEntryFinder.find_loader()`` will be used, if defined. * ``Loader.load_module()`` is used if ``exec_module()`` is not defined. * ``Loader.module_repr()`` is used by ``ModuleSpec.module_repr()`` if it exists. What Will not Change? --------------------- * The syntax and semantics of the import statement. * Existing finders and loaders will continue to work normally. * The import-related module attributes will still be initialized with the same information. * Finders will still create loaders, storing them in the specs. * ``Loader.load_module()``, if a module defines it, will have all the same requirements and may still be called directly. * Loaders will still be responsible for module data APIs. ModuleSpec Users ================ ``ModuleSpec`` objects has 3 distinct target audiences: Python itself, import hooks, and normal Python users. Python will use specs in the import machinery, in interpreter startup, and in various standard library modules. Some modules are import-oriented, like pkgutil, and others are not, like pickle and pydoc. In all cases, the full ``ModuleSpec`` API will get used. Import hooks (finders and loaders) will make use of the spec in specific ways, mostly without using the ``ModuleSpec`` instance methods. First of all, finders will use the factory methods to create spec objects. They may also directly adjust the spec attributes after the spec is created. Secondly, the finder may bind additional information to the spec for the loader to consume during module creation/execution. Finally, loaders will make use of the attributes on a spec when creating and/or executing a module. Python users will be able to inspect a module's ``__spec__`` to get import-related information about the object. Generally, they will not be using the ``ModuleSpec`` factory methods nor the instance methods. However, each spec has methods named ``create``, ``exec``, ``load``, and ``reload``. Since they are so easy to access (and misunderstand/abuse), their function and availability require explicit consideration in this proposal. What Will Existing Finders and Loaders Have to Do Differently? ============================================================== Immediately? Nothing. The status quo will be deprecated, but will continue working. However, here are the things that the authors of finders and loaders should change relative to this PEP: * Implement ``find_spec()`` on finders. * Implement ``exec_module()`` on loaders, if possible. The factory methods of ``ModuleSpec`` are intended to be helpful for converting existing finders. ``from_loader()`` and ``from_file_location()`` are both straight-forward utilities in this regard. In the case where loaders already expose methods for creating and preparing modules, a finder may use ``ModuleSpec.from_module()`` on a throw-away module to create the appropriate spec. As for loaders, ``exec_module()`` should be a relatively direct conversion from a portion of the existing ``load_module()``. However, ``Loader.create_module()`` will also be necessary in some uncommon cases. Furthermore, ``load_module()`` will still work as a final option when ``exec_module()`` is not appropriate. How Loading Will Work ===================== This is an outline of what happens in ``ModuleSpec.load()``. 1. A new module is created by calling ``spec.create()``. a. If the loader has a ``create_module()`` method, it gets called. Otherwise a new module gets created. b. The import-related module attributes are set. 2. The module is added to sys.modules. 3. ``spec.exec(module)`` gets called. a. If the loader has an ``exec_module()`` method, it gets called. Otherwise ``load_module()`` gets called for backward-compatibility and the resulting module is updated to match the spec. 4. If there were any errors the module is removed from sys.modules. 5. If the module was replaced in sys.modules during ``exec()``, the one in sys.modules is updated to match the spec. 6. The module in sys.modules is returned. These steps are exactly what ``Loader.load_module()`` is already expected to do. Loaders will thus be simplified since they will only need to implement the portion in step 3a. ModuleSpec ========== This is a new class which defines the import-related values to use when loading the module. It closely corresponds to the import-related attributes of module objects. ``ModuleSpec`` objects may also be used by finders and loaders and other import-related APIs to hold extra import-related state concerning the module. This greatly reduces the need to add any new new import-related attributes to module objects, and loader ``__init__`` methods will no longer need to accommodate such per-module state. General Notes ------------- * The spec for each module instance will be unique to that instance even if the information is identical to that of another spec. * A module's spec is not intended to be modified by anything but finders. Creating a ModuleSpec --------------------- **ModuleSpec(name, loader, *, origin=None, is_package=None)** .. container:: ``name``, ``loader``, and ``origin`` are set on the new instance without any modification. If ``is_package`` is not passed in, the loader's ``is_package()`` gets called (if available), or it defaults to `False`. If ``is_package`` is true, ``submodule_search_locations`` is set to a new empty list. Otherwise it is set to None. Other attributes not listed as parameters (such as ``package``) are either read-only dynamic properties or default to None. **from_filename(name, loader, *, filename=None, submodule_search_locations=None)** .. container:: This factory classmethod allows a suitable ModuleSpec instance to be easily created with extra file-related information. This includes the values that would be set on a module as ``__file__`` or ``__cached__``. ``is_location`` is set to True for specs created using ``from_filename()``. **from_module(module, loader=None)** .. container:: This factory is used to create a spec based on the import-related attributes of an existing module. Since modules should already have ``__spec__`` set, this method has limited utility. **from_loader(name, loader, *, origin=None, is_package=None)** .. container:: A factory classmethod that returns a new ``ModuleSpec`` derived from the arguments. ``is_package`` is used inside the method to indicate that the module is a package. If not explicitly passed in, it falls back to using the result of the loader's ``is_package()``, if available. If not available, if defaults to False. In contrast to ``ModuleSpec.__init__()``, which takes the arguments as-is, ``from_loader()`` calculates missing values from the ones passed in, as much as possible. This replaces the behavior that is currently provided by several ``importlib.util`` functions as well as the optional ``init_module_attrs()`` method of loaders. Just to be clear, here is a more detailed description of those calculations:: If not passed in, ``filename`` is to the result of calling the loader's ``get_filename()``, if available. Otherwise it stays unset (``None``). If not passed in, ``submodule_search_locations`` is set to an empty list if ``is_package`` is true. Then the directory from ``filename`` is appended to it, if possible. If ``is_package`` is false, ``submodule_search_locations`` stays unset. If ``cached`` is not passed in and ``filename`` is passed in, ``cached`` is derived from it. For filenames with a source suffix, it set to the result of calling ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. ``.pyc``), ``cached`` is set to the value of ``filename``. If ``filename`` is not passed in or ``cache_from_source()`` raises ``NotImplementedError``, ``cached`` stays unset. If not passed in, ``origin`` is set to ``filename``. Thus if ``filename`` is unset, ``origin`` stays unset. Attributes ---------- Each of the following names is an attribute on ``ModuleSpec`` objects. A value of ``None`` indicates "not set". This contrasts with module objects where the attribute simply doesn't exist. While ``package`` is a read-only property, the remaining attributes can be replaced after the module spec is created and even after import is complete. This allows for unusual cases where directly modifying the spec is the best option. However, typical use should not involve changing the state of a module's spec. Most of the attributes correspond to the import-related attributes of modules. Here is the mapping, followed by a description of the attributes. The reverse of this mapping is used by ``ModuleSpec.init_module_attrs()``. ========================== =========== On ModuleSpec On Modules ========================== =========== name __name__ loader __loader__ package __package__ origin __file__* cached __cached__* submodule_search_locations __path__** loading_info \- has_location (RO-property) \- ========================== =========== \* Only if ``is_location`` is true. \*\* Only if not None. **name** .. container:: The module's fully resolved and absolute name. It must be set. **loader** .. container:: The loader to use during loading and for module data. These specific functionalities do not change for loaders. Finders are still responsible for creating the loader and this attribute is where it is stored. The loader must be set. **origin** .. container:: A string for the location from which the module originates. Aside from the informational value, it is also used in ``module_repr()``. The module attribute ``__file__`` has a similar but more restricted meaning. Not all modules have it set (e.g. built-in modules). However, ``origin`` is applicable to essentially all modules. For built-in modules it would be set to "built-in". Secondary Attributes -------------------- Some of the ``ModuleSpec`` attributes are not set via arguments when creating a new spec. Either they are strictly dynamically calculated properties or they are simply set to None (aka "not set"). For the latter case, those attributes may still be set directly. **package** .. container:: A dynamic property that gives the name of the module's parent. The value is derived from ``name`` and ``is_package``. For packages it is the value of ``name``. Otherwise it is equivalent to ``name.rpartition('.')[0]``. Consequently, a top-level module will have the empty string for ``package``. **has_location** .. container:: Some modules can be loaded by reference to a location, e.g. a filesystem path or a URL or something of the sort. Having the location lets you load the module, but in theory you could load that module under various names. In contrast, non-located modules can't be loaded in this fashion, e.g. builtin modules and modules dynamically created in code. For these, the name is the only way to access them, so they have an "origin" but not a "location". This attribute reflects whether or not the module is locatable. If it is, ``origin`` must be set to the module's location and ``__file__`` will be set on the module. Furthermore, a locatable module is also cacheable and so ``__cached__`` is tied to ``has_location``. The corresponding module attribute name, ``__file__``, is somewhat inaccurate and potentially confusion, so we will use a more explicit combination of ``origin`` and ``has_location`` to represent the same information. Having a separate ``filename`` is unncessary since we have ``origin``. **cached** .. container:: A string for the location where the compiled code for a module should be stored. PEP 3147 details the caching mechanism of the import system. If ``has_location`` is true, this location string is set on the module as ``__cached__``. When ``from_filename()`` is used to create a spec, ``cached`` is set to the result of calling ``importlib.util.source_to_cache()``. ``cached`` is not necessarily a file location. A finder or loader may store an alternate location string in ``cached``. However, in practice this will be the file location dicated by PEP 3147. **submodule_search_locations** .. container:: The list of location strings, typically directory paths, in which to search for submodules. If the module is a package this will be set to a list (even an empty one). Otherwise it is ``None``. The corresponding module attribute's name, ``__path__``, is relatively ambiguous. Instead of mirroring it, we use a more explicit name that makes the purpose clear. **loading_info** .. container:: A finder may set ``loading_info`` to any value to provide additional data for the loader to use during loading. A value of ``None`` is the default and indicates that there is no additional data. Otherwise it is likely set to some containers, such as a ``dict``, ``list``, or ``types.SimpleNamespace`` containing the relevant extra information. For example, ``zipimporter`` could use it to pass the zip archive name to the loader directly, rather than needing to derive it from ``origin`` or create a custom loader for each find operation. Methods ------- **module_repr()** .. container:: Returns a repr string for the module, based on the module's import- related attributes and falling back to the spec's attributes. The string will reflect the current output of the module type's ``__repr__()``. The module type's ``__repr__()`` will use the module's ``__spec__`` exclusively. If the module does not have ``__spec__`` set, a spec is generated using ``ModuleSpec.from_module()``. Since the module attributes may be out of sync with the spec and to preserve backward-compatibility in that case, we defer to the module attributes and only when they are missing do we fall back to the spec attributes. **init_module_attrs(module)** .. container:: Sets the module's import-related attributes to the corresponding values in the module spec. If ``has_location`` is false on the spec, ``__file__`` and ``__cached__`` are not set on the module. ``__path__`` is only set on the module if ``submodule_search_locations`` is None. For the rest of the import-related module attributes, a ``None`` value on the spec (aka "not set") means ``None`` will be set on the module. If any of the attributes are already set on the module, the existing values are replaced. The module's own ``__spec__`` is not consulted but does get replaced with the spec on which ``init_module_attrs()`` was called. The earlier mapping of ``ModuleSpec`` attributes to module attributes indicates which attributes are involved on both sides. **create()** .. container:: A new module is created relative to the spec and its import-related attributes are set accordingly. If the spec's loader has a ``create_module()`` method, that gets called to create the module. This give the loader a chance to do any pre-loading initialization that can't otherwise be accomplished elsewhere. Otherwise a bare module object is created. In both cases ``init_module_attrs()`` is called on the module before it gets returned. **exec(module)** .. container:: The spec's loader is used to execute the module. If the loader has ``exec_module()`` defined, the namespace of ``module`` is the target of execution. Otherwise the loader's ``load_module()`` is called, which ignores ``module`` and returns the module that was the actual execution target. In that case the import-related attributes of that module are updated to reflect the spec. In both cases the targeted module is the one that gets returned. **load()** .. container:: This method captures the current functionality of and requirements on ``Loader.load_module()`` without any semantic changes. It is essentially a wrapper around ``create()`` and ``exec()`` with some extra functionality regarding ``sys.modules``. itself in ``sys.modules`` while executing. Consequently, the module in ``sys.modules`` is the one that gets returned by ``load()``. Right before ``exec()`` is called, the module is added to ``sys.modules``. In the case of error during loading the module is removed from ``sys.modules``. The module in ``sys.modules`` when ``load()`` finishes is the one that gets returned. Returning the module from ``sys.modules`` accommodates the ability of the module to replace itself there while it is executing (during load). As already noted, this is what already happens in the import system. ``load()`` is not meant to change any of this behavior. If ``loader`` is not set (``None``), ``load()`` raises a ValueError. **reload(module)** .. container:: As with ``load()`` this method faithfully fulfills the semantics of ``Loader.load_module()`` in the reload case, with one exception: reloading a module when ``exec_module()`` is available actually uses ``module`` rather than ignoring it in favor of the one in ``sys.modules``, as ``Loader.load_module()`` does. The functionality here mirrors that of ``load()``, minus the ``create()`` call and the ``sys.modules`` handling. .. XXX add more of importlib.reload()'s boilerplate to reload()? Omitted Attributes and Methods ------------------------------ There is no ``PathModuleSpec`` subclass of ``ModuleSpec`` that provides the ``has_location``, ``cached``, and ``submodule_search_locations`` functionality. While that might make the separation cleaner, module objects don't have that distinction. ``ModuleSpec`` will support both cases equally well. While ``is_package`` would be a simple additional attribute (aliasing ``self.submodule_search_locations is not None``), it perpetuates the artificial (and mostly erroneous) distinction between modules and packages. Conceivably, ``ModuleSpec.load()`` could optionally take a list of modules with which to interact instead of ``sys.modules``. That capability is left out of this PEP, but may be pursued separately at some other time, including relative to PEP 406 (import engine). Likewise ``load()`` could be leveraged to implement multi-version imports. While interesting, doing so is outside the scope of this proposal. Backward Compatibility ---------------------- ``ModuleSpec`` doesn't have any. This would be a different story if ``Finder.find_module()`` were to return a module spec instead of loader. In that case, specs would have to act like the loader that would have been returned instead. Doing so would be relatively simple, but is an unnecessary complication. Subclassing ----------- Subclasses of ModuleSpec are allowed, but should not be necessary. Simply setting ``loading_info`` or adding functionality to a custom finder or loader will likely be a better fit and should be tried first. However, as long as a subclass still fulfills the requirements of the import system, objects of that type are completely fine as the return value of ``Finder.find_spec()``. Existing Types ============== Module Objects -------------- **__spec__** .. container:: Module objects will now have a ``__spec__`` attribute to which the module's spec will be bound. None of the other import-related module attributes will be changed or deprecated, though some of them could be; any such deprecation can wait until Python 4. ``ModuleSpec`` objects will not be kept in sync with the corresponding module object's import-related attributes. Though they may differ, in practice they will typically be the same. One notable exception is that case where a module is run as a script by using the ``-m`` flag. In that case ``module.__spec__.name`` will reflect the actual module name while ``module.__name__`` will be ``__main__``. Finders ------- **MetaPathFinder.find_spec(name, path=None)** **PathEntryFinder.find_spec(name)** .. container:: Finders will return ModuleSpec objects when ``find_spec()`` is called. This new method replaces ``find_module()`` and ``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are used instead, for backward-compatibility. Adding yet another similar method to loaders is a case of practicality. ``find_module()`` could be changed to return specs instead of loaders. This is tempting because the import APIs have suffered enough, especially considering ``PathEntryFinder.find_loader()`` was just added in Python 3.3. However, the extra complexity and a less-than- explicit method name aren't worth it. Finders are still responsible for creating the loader. That loader will now be stored in the module spec returned by ``find_spec()`` rather than returned directly. As is currently the case without the PEP, if a loader would be costly to create, that loader can be designed to defer the cost until later. Loaders ------- **Loader.exec_module(module)** .. container:: Loaders will have a new method, ``exec_module()``. Its only job is to "exec" the module and consequently populate the module's namespace. It is not responsible for creating or preparing the module object, nor for any cleanup afterward. It has no return value. **Loader.load_module(fullname)** .. container:: The ``load_module()`` of loaders will still work and be an active part of the loader API. It is still useful for cases where the default module creation/prepartion/cleanup is not appropriate for the loader. If implemented, ``load_module()`` will still be responsible for its current requirements (prep/exec/etc.) since the method may be called directly. For example, the C API for extension modules only supports the full control of ``load_module()``. As such, ``ExtensionFileLoader`` will not implement ``exec_module()``. In the future it may be appropriate to produce a second C API that would support an ``exec_module()`` implementation for ``ExtensionFileLoader``. Such a change is outside the scope of this PEP. A loader must define either ``exec_module()`` or ``load_module()``. If both exist on the loader, ``ModuleSpec.load()`` uses ``exec_module()`` and ignores ``load_module()``. **Loader.create_module(spec)** .. container:: Loaders may also implement ``create_module()`` that will return a new module to exec. However, most loaders will not need to implement the method. PEP 420 introduced the optional ``module_repr()`` loader method to limit the amount of special-casing in the module type's ``__repr__()``. Since this method is part of ``ModuleSpec``, it will be deprecated on loaders. However, if it exists on a loader it will be used exclusively. ``Loader.init_module_attr()`` method, added prior to Python 3.4's release , will be removed in favor of the same method on ``ModuleSpec``. However, ``InspectLoader.is_package()`` will not be deprecated even though the same information is found on ``ModuleSpec``. ``ModuleSpec`` can use it to populate its own ``is_package`` if that information is not otherwise available. Still, it will be made optional. The path-based loaders in ``importlib`` take arguments in their ``__init__()`` and have corresponding attributes. However, the need for those values is eliminated by module specs. The only exception is ``FileLoader.get_filename()``, which uses ``self.path``. The signatures for these loaders and the accompanying attributes will be deprecated. In addition to executing a module during loading, loaders will still be directly responsible for providing APIs concerning module-related data. Other Changes ============= * The various finders and loaders provided by ``importlib`` will be updated to comply with this proposal. * The spec for the ``__main__`` module will reflect how the interpreter was started. For instance, with ``-m`` the spec's name will be that of the run module, while ``__main__.__name__`` will still be "__main__". * We add ``importlib.find_spec()`` to mirror ``importlib.find_loader()`` (which becomes deprecated). * Deprecations in ``importlib.util``: ``set_package()``, ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` (introduced prior to Python 3.4's release) can be removed. * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of the per-module import lock, whereas ``Loader.load_module()`` did not. Reference Implementation ======================== A reference implementation will be available at http://bugs.python.org/issue18864. Open Issues ============== \* The impact of this change on pkgutil (and setuptools) needs looking into. It has some generic function-based extensions to PEP 302. These may break if importlib starts wrapping loaders without the tools' knowledge. \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, inspect. \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the spec's loader? \* How to limit possible end-user confusion/abuses relative to spec attributes (since __spec__ will make them really accessible)? References ========== [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 28 16:43:16 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 08:43:16 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: > I realised that if we're going to allow mutating the spec in create, we're going to have to promise not to reuse them across load calls. So loaders can be shared, but specs can't. The latest version of the PEP already specifies that each module will have its own copy, even if the spec is otherwise the same. Perhaps it should also make clear that loading_info should not be shared between specs. It wouldn't hurt to also say something about allowing only one call to load() or something along those lines. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 28 18:04:39 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 10:04:39 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow wrote: > On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: > > I realised that if we're going to allow mutating the spec in create, > we're going to have to promise not to reuse them across load calls. So > loaders can be shared, but specs can't. > > The latest version of the PEP already specifies that each module will have > its own copy, even if the spec is otherwise the same. Perhaps it should > also make clear that loading_info should not be shared between specs. It > wouldn't hurt to also say something about allowing only one call to load() > or something along those lines > I see three options: 1. We advise against calling Modulespec.create() and ModuleSpec.load() more than once. 2. ModuleSpec's create() and load() programmatically disallow (or otherwise handle) being called more than once. 3. Dictate that Loader.create_module() must handle the case where it is called more than once. Fail? Return None? Return the same module as before? I'll advocate for 3 along with making sure ModuleSpec.create() correctly handles the exceptional response of Loader.create_module(). However, the PEP does not really specify what happens when create() and load() are called multiple times. That needs to be added. I'm tempted to have load() simply return whatever is in sys.modules and bypass loading if the module is already loaded. And create() would simply return a new, prepared module, with special handling for the Loader.create_module() exceptional case. Really, the sticky part is the (potential) call to Loader.create_module() in ModuleSpec.create(). Otherwise it should not matter. ModuleSpec.exec() should be able to be called as many times as desired, just like Loader.load_module() (and Loader.exec_module()). -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 28 18:06:04 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 10:06:04 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: <521DC7AD.3050400@trueblade.com> References: <521DC7AD.3050400@trueblade.com> Message-ID: On Wed, Aug 28, 2013 at 3:49 AM, Eric V. Smith wrote: > No matter how big I make the message limit, the PEP seems to exceed it! > I'll release it shortly. > Sorry for the trouble. I appreciate you running the list and dealing with my verbose PEP. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 28 19:25:15 2013 From: brett at python.org (Brett Cannon) Date: Wed, 28 Aug 2013 13:25:15 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 10:43 AM, Eric Snow wrote: > On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: > > I realised that if we're going to allow mutating the spec in create, > we're going to have to promise not to reuse them across load calls. So > loaders can be shared, but specs can't. > > The latest version of the PEP already specifies that each module will have > its own copy, even if the spec is otherwise the same. Perhaps it should > also make clear that loading_info should not be shared between specs. > That's really none of our business. If loading_info is going to be up to the finder to populate and the loader to consume as an opaque thing then we should not dictate its usage, just say that only the corresponding loader for the finder should use that object and that people should not expect its interface to be stable. > It wouldn't hurt to also say something about allowing only one call to > load() or something along those lines. > Why? You can create objects constantly. You should say you expect people to use reload() to reload things, but otherwise what if I truly want to reset the module and start from scratch with a second call to load()? -Brett > -eric > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 28 19:27:12 2013 From: brett at python.org (Brett Cannon) Date: Wed, 28 Aug 2013 13:27:12 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 12:04 PM, Eric Snow wrote: > On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow wrote: > >> On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: >> > I realised that if we're going to allow mutating the spec in create, >> we're going to have to promise not to reuse them across load calls. So >> loaders can be shared, but specs can't. >> >> The latest version of the PEP already specifies that each module will >> have its own copy, even if the spec is otherwise the same. Perhaps it >> should also make clear that loading_info should not be shared between >> specs. It wouldn't hurt to also say something about allowing only one call >> to load() or something along those lines >> > I see three options: > > 1. We advise against calling Modulespec.create() and ModuleSpec.load() > more than once. > 2. ModuleSpec's create() and load() programmatically disallow (or > otherwise handle) being called more than once. > 3. Dictate that Loader.create_module() must handle the case where it is > called more than once. Fail? Return None? Return the same module as > before? > > I'll advocate for 3 along with making sure ModuleSpec.create() correctly > handles the exceptional response of Loader.create_module(). However, the > PEP does not really specify what happens when create() and load() are > called multiple times. That needs to be added. I'm tempted to have load() > simply return whatever is in sys.modules and bypass loading if the module > is already loaded. > Isn't that the point of reload() sans the blind return? This is heading down the road of trying to worry about stuff that will likely never happen except by people trying to bypass the import system and thus are just asking to get screwed up. We shouldn't bend over to block or (or support it). -Brett > And create() would simply return a new, prepared module, with special > handling for the Loader.create_module() exceptional case. > > Really, the sticky part is the (potential) call to Loader.create_module() > in ModuleSpec.create(). Otherwise it should not matter. ModuleSpec.exec() > should be able to be called as many times as desired, just like > Loader.load_module() (and Loader.exec_module()). > > -eric > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 28 21:34:43 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 13:34:43 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Aug 28, 2013 11:25 AM, "Brett Cannon" wrote: > On Wed, Aug 28, 2013 at 10:43 AM, Eric Snow wrote: >> >> On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: >> > I realised that if we're going to allow mutating the spec in create, we're going to have to promise not to reuse them across load calls. So loaders can be shared, but specs can't. >> >> The latest version of the PEP already specifies that each module will have its own copy, even if the spec is otherwise the same. Perhaps it should also make clear that loading_info should not be shared between specs. > > > That's really none of our business. If loading_info is going to be up to the finder to populate and the loader to consume as an opaque thing then we should not dictate its usage, just say that only the corresponding loader for the finder should use that object and that people should not expect its interface to be stable. Fair enough. > >> >> It wouldn't hurt to also say something about allowing only one call to load() or something along those lines. > > > Why? You can create objects constantly. You should say you expect people to use reload() to reload things, but otherwise what if I truly want to reset the module and start from scratch with a second call to load()? That's fine. I'll just make sure to note what happens when the different spec methods are called more than once. If a loader can't handle multiple create_module() calls, I'd expect an ImportError. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 28 21:40:57 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 13:40:57 -0600 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Aug 28, 2013 11:27 AM, "Brett Cannon" wrote: > On Wed, Aug 28, 2013 at 12:04 PM, Eric Snow wrote: >> >> On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow wrote: >>> >>> On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: >>> > I realised that if we're going to allow mutating the spec in create, we're going to have to promise not to reuse them across load calls. So loaders can be shared, but specs can't. >>> >>> The latest version of the PEP already specifies that each module will have its own copy, even if the spec is otherwise the same. Perhaps it should also make clear that loading_info should not be shared between specs. It wouldn't hurt to also say something about allowing only one call to load() or something along those lines >> >> I see three options: >> >> 1. We advise against calling Modulespec.create() and ModuleSpec.load() more than once. >> 2. ModuleSpec's create() and load() programmatically disallow (or otherwise handle) being called more than once. >> 3. Dictate that Loader.create_module() must handle the case where it is called more than once. Fail? Return None? Return the same module as before? >> >> I'll advocate for 3 along with making sure ModuleSpec.create() correctly handles the exceptional response of Loader.create_module(). However, the PEP does not really specify what happens when create() and load() are called multiple times. That needs to be added. I'm tempted to have load() simply return whatever is in sys.modules and bypass loading if the module is already loaded. > > > Isn't that the point of reload() sans the blind return? This is heading down the road of trying to worry about stuff that will likely never happen except by people trying to bypass the import system and thus are just asking to get screwed up. We shouldn't bend over to block or (or support it). I'm fine with that. My only concern is the case where people take advantage of the spec methods to directly load/reload/etc. and it behaves in an unexpected way. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 28 22:20:49 2013 From: brett at python.org (Brett Cannon) Date: Wed, 28 Aug 2013 16:20:49 -0400 Subject: [Import-SIG] Round 2 for "A ModuleSpec Type for the Import System" In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 3:40 PM, Eric Snow wrote: > > On Aug 28, 2013 11:27 AM, "Brett Cannon" wrote: > > On Wed, Aug 28, 2013 at 12:04 PM, Eric Snow > wrote: > >> > >> On Wed, Aug 28, 2013 at 8:43 AM, Eric Snow > wrote: > >>> > >>> On Aug 28, 2013 3:26 AM, "Nick Coghlan" wrote: > >>> > I realised that if we're going to allow mutating the spec in create, > we're going to have to promise not to reuse them across load calls. So > loaders can be shared, but specs can't. > >>> > >>> The latest version of the PEP already specifies that each module will > have its own copy, even if the spec is otherwise the same. Perhaps it > should also make clear that loading_info should not be shared between > specs. It wouldn't hurt to also say something about allowing only one call > to load() or something along those lines > >> > >> I see three options: > >> > >> 1. We advise against calling Modulespec.create() and ModuleSpec.load() > more than once. > >> 2. ModuleSpec's create() and load() programmatically disallow (or > otherwise handle) being called more than once. > >> 3. Dictate that Loader.create_module() must handle the case where it is > called more than once. Fail? Return None? Return the same module as > before? > >> > >> I'll advocate for 3 along with making sure ModuleSpec.create() > correctly handles the exceptional response of Loader.create_module(). > However, the PEP does not really specify what happens when create() and > load() are called multiple times. That needs to be added. I'm tempted to > have load() simply return whatever is in sys.modules and bypass loading if > the module is already loaded. > > > > > > Isn't that the point of reload() sans the blind return? This is heading > down the road of trying to worry about stuff that will likely never happen > except by people trying to bypass the import system and thus are just > asking to get screwed up. We shouldn't bend over to block or (or support > it). > > I'm fine with that. My only concern is the case where people take > advantage of the spec methods to directly load/reload/etc. and it behaves > in an unexpected way. > They shouldn't do that. =) If it is that big of a worry then the methods could shift to importlib.abc.Loader and be completely removed from ModuleSpec to make it very obvious they should not be trifled with unless you know what you are doing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 29 00:10:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 29 Aug 2013 08:10:38 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: Proper review on the weekend, but quickish comments for now: - update looks really good, and solves several issues with the original proposal. Big +1 for keeping it simple and adding a new finder method :) - for extension modules that don't define a creation hook (which I won't be able to figure out before create_module is called), I'd like to be able to return NotImplemented from create_module to say "please give me a normal module, or re-use the existing one for reloading". - I'd like to finally make "can reload or not" explicit in the loader API. My current idea for this is to add a "reloading" parameter to create_module, where we pass in the module to be reloaded. Loaders that support reloading *must* either not define create_module, or, if they define it, return NotImplemented or return the passed in module in that case. If it returns a new module, the reload should fail. This shouldn't break backwards compatibility, as init based extension modules are cached internally, while modules that use the new hooks can decide for themselves whether or not to support reloading - I need to check the other proposed reload changes for backwards compatibility issues (I'm not sure we can ignore changes made to sys.modules in that case) - frozen modules should have a special origin string, too - my preferred bikeshed colours are "loader_state" or "loader_info" Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 28 19:22:59 2013 From: brett at python.org (Brett Cannon) Date: Wed, 28 Aug 2013 13:22:59 -0400 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 4:50 AM, Eric Snow wrote: > I've incorporated the feedback into the PEP and gave up on trying to > re-purpose Finder.find_module() (which wasn't worth it). Let me know what > you think. I'll have the implementation up on > http://bugs.python.org/issue18864 in the next couple days. > > -eric > > > ---------------------------------------------------------------------------------------- > > PEP: 451 > Title: A ModuleSpec Type for the Import System > Version: $Revision$ > Last-Modified: $Date$ > Author: Eric Snow > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Aug-2013 > Python-Version: 3.4 > Post-History: 8-Aug-2013 > 28-Aug-2013 > Resolution: > > > Abstract > ======== > > This PEP proposes to add a new class to ``importlib.machinery`` called > ``ModuleSpec``. It will be authoritative for all the import-related > information about a module, and will be available without needing to > load the module first. Finders will provide a module's spec instead of > a loader. > Don't you mean finders will return a ModuleSpec? Since 'loader' is still defined in the ModuleSpec to know what loader to use that statement that finders don't provide a loader is misleading. > The import machinery will be adjusted to take advantage of > module specs, including using them to load modules. > > > Motivation > ========== > > The import system has evolved over the lifetime of Python. In late 2002 > PEP 302 introduced standardized import hooks via ``finders`` and > ``loaders`` and ``sys.meta_path``. The ``importlib`` module, introduced > with Python 3.1, now exposes a pure Python implementation of the APIs > described by PEP 302, as well as of the full import system. It is now > much easier to understand and extend the import system. While a benefit > to the Python community, this greater accessibilty also presents a > challenge. > > As more developers come to understand and customize the import system, > any weaknesses in the finder and loader APIs will be more impactful. So > the sooner we can address any such weaknesses the import system, the > better...and there are a couple we can take care of with this proposal. > > Firstly, any time the import system needs to save information about a > module we end up with more attributes on module objects that are > generally only meaningful to the import system and occasionally to some > people. > Leave out "and occasionally to some people"; saying "generally" implies that some people occasionally find it useful. > It would be nice to have a per-module namespace to put future > import-related information. > ".. nice to have only a ... and pass within the import system." > Secondly, there's an API void between > finders and loaders that causes undue complexity when encountered. > > Currently finders are strictly responsible for providing the loader > which the import system will use to load the module. > "... through their ``find_module()`` method." > The loader is then > responsible for doing some checks, creating the module object, setting > import-related attributes, "installing" the module to ``sys.modules``, > and loading the module, along with some cleanup. This all takes place > during the import system's call to ``Loader.load_module()``. Loaders > also provide some APIs for accessing data associated with a module. > > Loaders are not required to provide any of the functionality of > ``load_module()`` through other methods. Thus, though the import- > related information about a module is likely available without loading > the module, it is not otherwise exposed. > > Furthermore, the requirements assocated with ``load_module()`` are > common to all loaders and mostly are implemented in exactly the same > way. This means every loader has to duplicate the same boilerplate > code. ``importlib.util`` provides some tools that help with this, but > it would be more helpful if the import system simply took charge of > these responsibilities. The trouble is that this would limit the degree > of customization that ``load_module()`` facilitates. This is a gap > between finders and loaders which this proposal aims to fill. > > Finally, when the import system calls a finder's ``find_module()``, the > finder makes use of a variety of information about the module that is > useful outside the context of the method. Currently the options are > limited for persisting that per-module information past the method call, > since it only returns the loader. Popular options for this limitation > are to store the information in a module-to-info mapping somewhere on > the finder itself, or store it on the loader. > > Unfortunately, loaders are not required to be module-specific. On top > of that, some of the useful information finders could provide is > common to all finders, so ideally the import system could take care of > that. > "that" -> "those details" > This is the same gap as before between finders and loaders. > > As an example of complexity attributable to this flaw, the > implementation of namespace packages in Python 3.3 (see PEP 420) added > ``FileFinder.find_loader()`` because there was no good way for > ``find_module()`` to provide the namespace search locations. > > The answer to this gap is a ``ModuleSpec`` object that contains the > per-module information and takes care of the boilerplate functionality > of loading the module. > "of loading the module" -> "involved with loading the module". > > (The idea gained momentum during discussions related to another PEP.[1]) > > > Specification > ============= > > The goal is to address the gap between finders and loaders while > changing as little of their semantics as possible. Though some > functionality and information is moved to the new ``ModuleSpec`` type, > their behavior should remain the same. However, for the sake of clarity > the finder and loader semantics will be explicitly identified. > > This is a high-level summary of the changes described by this PEP. More > detail is available in later sections. > > importlib.machinery.ModuleSpec (new) > ------------------------------------ > For this entire section you need to provide the call signatures as you start talking semantics later w/o making clear what is being passed and returned before going into detail of the individual methods. Otherwise move the detailed discussion of the methods up to before the semantics overview. > > Attributes: > > * name - a string for the name of the module. > * loader - the loader to use for loading and for module data. > * origin - a string for the location from which the module is loaded. > I would give an "e.g." here to help explain what you mean. As previous comments have shown, the name alone is not enough to understand what value should go here. =) > * submodule_search_locations - strings for where to find submodules, > if a package. > * loading_info - a container of data for use during loading (or None). > * cached (property) - a string for where the compiled module will be > stored. > * is_location (RO-property) - the module's origin refers to a location. > > .. XXX Find a better name than loading_info? > loading_data is all that I can think of > .. XXX Add ``submodules`` (RO-property) - returns possible submodules > relative to spec (or None)? > Actual use-case or are you just guessing there will be a use? Don't add any fields that we have not seen an actual need for. > .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any? > Too easy to figure out with ``name in sys.modules`` and can go stale (unless you make this a property). > > Factory Methods: > > * from_file_location() - factory for file-based module specs. > * from_module() - factory based on import-related module attributes. > * from_loader() - factory based on information provided by loaders. > > .. XXX Move the factories to importlib.util or make class-only? > > Instance Methods: > > * init_module_attrs() - populate a module's import-related attributes. > * module_repr() - provide a repr string for a module. > * create() - provide a new module to use for loading. > * exec() - execute the spec into a module namespace. > * load() - prepare a module and execute it in a protected way. > * reload() - re-execute a module in a protected way. > > .. XXX Make module_repr() match the spec (BC problem?)? > > API Additions > ------------- > > * ``importlib.abc.Loader.exec_module()`` will execute a module in its > own namespace, replacing ``importlib.abc.Loader.load_module()``. > * ``importlib.abc.Loader.create_module()`` (optional) will return a new > module to use for loading. > * Module objects will have a new attribute: ``__spec__``. > * ``importlib.find_spec()`` will return the spec for a module. > * ``__subclasshook__()`` will be implemented on the importlib ABCs. > > .. XXX Do __subclasshook__() separately from the PEP (issue18862). > > API Changes > ----------- > > * Import-related module attributes will no longer be authoritative nor > used by the import system. > * ``InspectLoader.is_package()`` will become optional. > > .. XXX module __repr__() will prefer spec attributes? > > Deprecations > ------------ > > * ``importlib.abc.MetaPathFinder.find_module()`` > * ``importlib.abc.PathEntryFinder.find_module()`` > * ``importlib.abc.PathEntryFinder.find_loader()`` > * ``importlib.abc.Loader.load_module()`` > * ``importlib.abc.Loader.module_repr()`` > * The parameters and attributes of the various loaders in > ``importlib.machinery`` > * ``importlib.util.set_package()`` > * ``importlib.util.set_loader()`` > * ``importlib.find_loader()`` > > Removals > -------- > > * ``importlib.abc.Loader.init_module_attrs()`` > * ``importlib.util.module_to_load()`` > > Other Changes > ------------- > > * The spec for the ``__main__`` module will reflect the appropriate > name and origin. > * The module type's ``__repr__`` will defer to ModuleSpec exclusively. > > Backward-Compatibility > ---------------------- > > * If a finder does not define ``find_spec()``, a spec is derived from > the loader returned by ``find_module()``. > * ``PathEntryFinder.find_loader()`` will be used, if defined. > * ``Loader.load_module()`` is used if ``exec_module()`` is not defined. > * ``Loader.module_repr()`` is used by ``ModuleSpec.module_repr()`` if it > exists. > > What Will not Change? > --------------------- > > * The syntax and semantics of the import statement. > * Existing finders and loaders will continue to work normally. > * The import-related module attributes will still be initialized with > the same information. > * Finders will still create loaders, storing them in the specs. > * ``Loader.load_module()``, if a module defines it, will have all the > same requirements and may still be called directly. > * Loaders will still be responsible for module data APIs. > > > ModuleSpec Users > ================ > > ``ModuleSpec`` objects has 3 distinct target audiences: Python itself, > import hooks, and normal Python users. > > Python will use specs in the import machinery, in interpreter startup, > and in various standard library modules. Some modules are > import-oriented, like pkgutil, and others are not, like pickle and > pydoc. In all cases, the full ``ModuleSpec`` API will get used. > > Import hooks (finders and loaders) will make use of the spec in specific > ways, mostly without using the ``ModuleSpec`` instance methods. First > of all, finders will use the factory methods to create spec objects. > They may also directly adjust the spec attributes after the spec is > created. Secondly, the finder may bind additional information to the > spec for the loader to consume during module creation/execution. > Finally, loaders will make use of the attributes on a spec when creating > and/or executing a module. > > Python users will be able to inspect a module's ``__spec__`` to get > import-related information about the object. Generally, they will not > be using the ``ModuleSpec`` factory methods nor the instance methods. > As of right now no one is using the instance methods based on the wording in this section. =) > However, each spec has methods named ``create``, ``exec``, ``load``, and > ``reload``. Since they are so easy to access (and misunderstand/abuse), > their function and availability require explicit consideration in this > proposal. > > > What Will Existing Finders and Loaders Have to Do Differently? > ============================================================== > > Immediately? Nothing. The status quo will be deprecated, but will > continue working. However, here are the things that the authors of > finders and loaders should change relative to this PEP: > > * Implement ``find_spec()`` on finders. > * Implement ``exec_module()`` on loaders, if possible. > > The factory methods of ``ModuleSpec`` are intended to be helpful for > converting existing finders. ``from_loader()`` and > ``from_file_location()`` are both straight-forward utilities in this > regard. > If this holds to be true then they should go into importlib.util and kept out of the general object since dir(module_spec) shouldn't need to show the methods indefinitely. > In the case where loaders already expose methods for creating > and preparing modules, a finder may use ``ModuleSpec.from_module()`` on > a throw-away module to create the appropriate spec. > Why is the module a throw-away one? And why would loaders need to construct a ModuleSpec? > > As for loaders, > You were just talking about loader, so this is a bad transition. > ``exec_module()`` should be a relatively direct > conversion from a portion of the existing ``load_module()``. However, > ``Loader.create_module()`` will also be necessary in some uncommon > cases. Furthermore, ``load_module()`` will still work as a final option > when ``exec_module()`` is not appropriate. > > > How Loading Will Work > ===================== > > This is an outline of what happens in ``ModuleSpec.load()``. > > 1. A new module is created by calling ``spec.create()``. > > a. If the loader has a ``create_module()`` method, it gets called. > Otherwise a new module gets created. > b. The import-related module attributes are set. > So it seems step (b) happens even if step (a) does. If that's the case then are attributes overridden blindly, or conditionally set? If (b) doesn't happen if (a) did then you need to make that clear. > > 2. The module is added to sys.modules. > I would add a note that there is a separate method for handling reloads and thus blindly setting sys.modules is acceptable. > 3. ``spec.exec(module)`` gets called. > > a. If the loader has an ``exec_module()`` method, it gets called. > Otherwise ``load_module()`` gets called for backward-compatibility > and the resulting module is updated to match the spec. > "resulting module found in sys.modules is". And I think you meant to make step (b) be the fallback to load_module(). > > 4. If there were any errors the module is removed from sys.modules. > 5. If the module was replaced in sys.modules during ``exec()``, the one > in sys.modules is updated to match the spec. > This doesn't make sense. You just said the module got updated to match the spec in step 3.a. Are you saying you're going to overwrite values that exec_module() set? And once again, blindly updating or conditionally? And how are these attributes being set? Since exec_module() is going to need to set these anyway for proper exec() use during loading then why are you setting them *again* later on? Should you set these first and then let the methods reset them as they see fit? I thought exec_module() took in a filled-in module anyway, so didn't you have to set all the attributes prior to passing it in anyway in step 1.a? In that case this is a reset which seems wrong if code explicitly chose to change the values. > 6. The module in sys.modules is returned. > Or you can just provide the pseudo-code and skip all of this explanation and be easier to follow =) You can leave comments with step numbers if you want to expound upon any specific step outside of the pseudo-code: class ModuleSpec: def load(self): module = self.create() sys.modules[self.name] = module try: self.exec(module) except: try: del sys.modules[self.name] except KeyError: pass else: # XXX different from proposal: didn't reset attributes return sys.modules[self.name] def create(self): if hasattr(self.loader, 'create_module'): module = self.loader.create_module(self) else: module = types.ModuleType(self.name) # XXX different from proposal: didn't do it blindly after create_module() self.init_module_attrs(module) return module def exec(self, module): if hasattr(self.loader, 'exec_module'): self.loader.exec_module(module) elif hasattr(self.loader, 'load_module'): self.loader.load_module(self.name) module = sys.modules[self.name] else: raise TypeError('{!r} loader does not have an ' + 'exec_module or load_module method'.format(self.loader)) return module > > These steps are exactly what ``Loader.load_module()`` is already > expected to do. Loaders will thus be simplified since they will only > need to implement the portion in step 3a. > > > ModuleSpec > ========== > > This is a new class which defines the import-related values to use when > loading the module. It closely corresponds to the import-related > attributes of module objects. ``ModuleSpec`` objects may also be used > by finders and loaders and other import-related APIs to hold extra > import-related state concerning the module. This greatly reduces the > need to add any new new import-related attributes to module objects, and > loader ``__init__`` methods will no longer need to accommodate such > per-module state. > > General Notes > ------------- > > * The spec for each module instance will be unique to that instance even > if the information is identical to that of another spec. > * A module's spec is not intended to be modified by anything but > finders. > > Creating a ModuleSpec > --------------------- > > **ModuleSpec(name, loader, *, origin=None, is_package=None)** > > .. container:: > > ``name``, ``loader``, and ``origin`` are set on the new instance > without any modification. If ``is_package`` is not passed in, the > loader's ``is_package()`` gets called (if available), or it defaults > to `False`. If ``is_package`` is true, > ``submodule_search_locations`` is set to a new empty list. Otherwise > it is set to None. > > Other attributes not listed as parameters (such as ``package``) are > either read-only dynamic properties or default to None. > > **from_filename(name, loader, *, filename=None, > submodule_search_locations=None)** > > .. container:: > > This factory classmethod allows a suitable ModuleSpec instance to be > easily created with extra file-related information. This includes > the values that would be set on a module as ``__file__`` or > ``__cached__``. > > ``is_location`` is set to True for specs created using > ``from_filename()``. > > **from_module(module, loader=None)** > > .. container:: > > This factory is used to create a spec based on the import-related > attributes of an existing module. Since modules should already have > ``__spec__`` set, this method has limited utility. > "this method is expect to only be used in backwards-compatibility situations." > > **from_loader(name, loader, *, origin=None, is_package=None)** > > .. container:: > > A factory classmethod that returns a new ``ModuleSpec`` derived from > the arguments. ``is_package`` is used inside the method to indicate > that the module is a package. If not explicitly passed in, it falls > back to using the result of the loader's ``is_package()``, if > available. If not available, if defaults to False. > > In contrast to ``ModuleSpec.__init__()``, which takes the arguments > as-is, ``from_loader()`` calculates missing values from the ones > passed in, as much as possible. This replaces the behavior that is > currently provided by several ``importlib.util`` functions as well as > the optional ``init_module_attrs()`` method of loaders. > "optional (and proposed-to-be-deprecated)" > Just to be > clear, here is a more detailed description of those calculations:: > > If not passed in, ``filename`` is to the result of calling the > loader's ``get_filename()``, if available. Otherwise it stays > unset (``None``). > > If not passed in, ``submodule_search_locations`` is set to an empty > list if ``is_package`` is true. Then the directory from ``filename`` > is appended to it, if possible. If ``is_package`` is false, > ``submodule_search_locations`` stays unset. > > If ``cached`` is not passed in and ``filename`` is passed in, > ``cached`` is derived from it. For filenames with a source suffix, > it set to the result of calling > ``importlib.util.cache_from_source()``. For bytecode suffixes (e.g. > ``.pyc``), ``cached`` is set to the value of ``filename``. If > ``filename`` is not passed in or ``cache_from_source()`` raises > ``NotImplementedError``, ``cached`` stays unset. > > If not passed in, ``origin`` is set to ``filename``. Thus if > ``filename`` is unset, ``origin`` stays unset. > > > Attributes > ---------- > > Each of the following names is an attribute on ``ModuleSpec`` objects. > A value of ``None`` indicates "not set". This contrasts with module > objects where the attribute simply doesn't exist. > > While ``package`` is a read-only property, the remaining attributes can > be replaced after the module spec is created and even after import is > complete. This allows for unusual cases where directly modifying the > spec is the best option. However, typical use should not involve > changing the state of a module's spec. > > Most of the attributes correspond to the import-related attributes of > modules. Here is the mapping, followed by a description of the > attributes. The reverse of this mapping is used by > ``ModuleSpec.init_module_attrs()``. > > ========================== =========== > On ModuleSpec On Modules > ========================== =========== > name __name__ > loader __loader__ > package __package__ > origin __file__* > cached __cached__* > This shouldn't be set on extension modules, so this is another asterisk of has_location *and* is not None (right?). > submodule_search_locations __path__** > loading_info \- > has_location (RO-property) \- > ========================== =========== > > \* Only if ``is_location`` is true. > Should that be has_location? > \*\* Only if not None. > "Set only if not None" > > **name** > > .. container:: > > The module's fully resolved and absolute name. It must be set. > > **loader** > > .. container:: > > The loader to use during loading and for module data. These specific > functionalities do not change for loaders. Finders are still > responsible for creating the loader and this attribute is where it is > stored. The loader must be set. > > **origin** > > .. container:: > > A string for the location from which the module originates. Aside from > the informational value, it is also used in ``module_repr()``. > > The module attribute ``__file__`` has a similar but more restricted > meaning. Not all modules have it set (e.g. built-in modules). However, > ``origin`` is applicable to essentially all modules. For built-in > modules it would be set to "built-in". > > Secondary Attributes > -------------------- > > Some of the ``ModuleSpec`` attributes are not set via arguments when > creating a new spec. Either they are strictly dynamically calculated > properties or they are simply set to None (aka "not set"). For the > latter case, those attributes may still be set directly. > > **package** > > .. container:: > > A dynamic property that gives the name of the module's parent. The > value is derived from ``name`` and ``is_package``. For packages it is > the value of ``name``. Otherwise it is equivalent to > ``name.rpartition('.')[0]``. Consequently, a top-level module will have > the empty string for ``package``. > > **has_location** > > .. container:: > > Some modules can be loaded by reference to a location, e.g. a filesystem > path or a URL or something of the sort. Having the location lets you > load the module, but in theory you could load that module under various > names. > > In contrast, non-located modules can't be loaded in this fashion, e.g. > builtin modules and modules dynamically created in code. For these, the > name is the only way to access them, so they have an "origin" but not a > "location". > > This attribute reflects whether or not the module is locatable. If it > is, ``origin`` must be set to the module's location and ``__file__`` > will be set on the module. Furthermore, a locatable module is also > cacheable and so ``__cached__`` is tied to ``has_location``. > That statement about __cached__ is not true for extension modules. You're going to need to tweak how you define 'cached' based on this. Either that or you can try to use this as a justification for loader.create_module() as you can override these semantics there as a pure Python module is more common than extension modules (although this doesn't help with the ModuleSpec having the wrong information when returned from the finder unless the finder itself resets it on the ModuleSpec before returning it). > > The corresponding module attribute name, ``__file__``, is somewhat > inaccurate and potentially confusion, so we will use a more explicit > combination of ``origin`` and ``has_location`` to represent the same > information. Having a separate ``filename`` is unncessary since we have > ``origin``. > > **cached** > > .. container:: > > A string for the location where the compiled code for a module should be > stored. PEP 3147 details the caching mechanism of the import system. > > If ``has_location`` is true, this location string is set on the module > as ``__cached__``. When ``from_filename()`` is used to create a spec, > ``cached`` is set to the result of calling > ``importlib.util.source_to_cache()``. > > ``cached`` is not necessarily a file location. A finder or loader may > store an alternate location string in ``cached``. However, in practice > this will be the file location dicated by PEP 3147. > > **submodule_search_locations** > > .. container:: > > The list of location strings, typically directory paths, in which to > search for submodules. If the module is a package this will be set to > a list (even an empty one). Otherwise it is ``None``. > > The corresponding module attribute's name, ``__path__``, is relatively > ambiguous. Instead of mirroring it, we use a more explicit name that > makes the purpose clear. > > **loading_info** > > .. container:: > > A finder may set ``loading_info`` to any value to provide additional > data for the loader to use during loading. A value of ``None`` is the > default and indicates that there is no additional data. Otherwise it is > likely set to some containers, such as a ``dict``, ``list``, or > "Otherwise it can be set to any object." > ``types.SimpleNamespace`` containing the relevant extra information. > > For example, ``zipimporter`` could use it to pass the zip archive name > to the loader directly, rather than needing to derive it from ``origin`` > or create a custom loader for each find operation. > > Methods > ------- > > **module_repr()** > > .. container:: > > Returns a repr string for the module, based on the module's import- > related attributes and falling back to the spec's attributes. The > string will reflect the current output of the module type's > ``__repr__()``. > > The module type's ``__repr__()`` will use the module's ``__spec__`` > exclusively. If the module does not have ``__spec__`` set, a spec is > generated using ``ModuleSpec.from_module()``. > > Since the module attributes may be out of sync with the spec and to > preserve backward-compatibility in that case, we defer to the module > attributes and only when they are missing do we fall back to the spec > attributes. > > **init_module_attrs(module)** > > .. container:: > > Sets the module's import-related attributes to the corresponding values > in the module spec. If ``has_location`` is false on the spec, > ``__file__`` and ``__cached__`` are not set on the module. ``__path__`` > is only set on the module if ``submodule_search_locations`` is None. > For the rest of the import-related module attributes, a ``None`` value > on the spec (aka "not set") means ``None`` will be set on the module. > If any of the attributes are already set on the module, the existing > values are replaced. The module's own ``__spec__`` is not consulted but > does get replaced with the spec on which ``init_module_attrs()`` was > called. The earlier mapping of ``ModuleSpec`` attributes to module > attributes indicates which attributes are involved on both sides. > > **create()** > > .. container:: > > A new module is created relative to the spec and its import-related > attributes are set accordingly. If the spec's loader has a > ``create_module()`` method, that gets called to create the module. This > give the loader a chance to do any pre-loading initialization that can't > otherwise be accomplished elsewhere. Otherwise a bare module object is > created. In both cases ``init_module_attrs()`` is called on the module > before it gets returned. > As stated earlier, I don't like the idea of blindly resetting attributes if set by create_module(). > > **exec(module)** > > .. container:: > > The spec's loader is used to execute the module. If the loader has > ``exec_module()`` defined, the namespace of ``module`` is the target of > execution. > Wait, what? You suggest it's the module in the signature but module.__dict__ in the explanation. > Otherwise the loader's ``load_module()`` is called, which > ignores ``module`` and returns the module that was the actual > execution target. > Are you pulling from sys.modules? Otherwise how are you getting the module from load_module()? And you don't mention that in one case the module is not put into sys.modules while in the other case it is (exec_module vs. load_module). That dichotomy is going to be messy. Does this need to be separate from load()? If you merge it in then the sys.modules semantics are unified within load(). Otherwise you need to make this set sys.modules in either case and return from sys.modules. > In that case the import-related attributes of that > module are updated to reflect the spec. > Why? If you already set the attributes in the module and inserted it into sys.modules previously then you already took care of this. Else you now are setting the attributes potentially *three* times (twice in create() from loader.create_module() + an explicit call to init_module_attr() and then here). > In both cases the targeted > module is the one that gets returned. > Huh? What exactly are you returning? You say "actual execution target" above for load_module() but "in both cases the target module" here. That seems to contradictory. > > **load()** > > .. container:: > > This method captures the current functionality of and requirements on > ``Loader.load_module()`` without any semantic changes. It is > essentially a wrapper around ``create()`` and ``exec()`` with some > extra functionality regarding ``sys.modules``. > > itself in ``sys.modules`` while executing. Consequently, the module in > ``sys.modules`` is the one that gets returned by ``load()``. > > Right before ``exec()`` is called, the module is added to > ``sys.modules``. In the case of error during loading the module is > removed from ``sys.modules``. The module in ``sys.modules`` when > ``load()`` finishes is the one that gets returned. Returning the module > from ``sys.modules`` accommodates the ability of the module to replace > itself there while it is executing (during load). > > As already noted, this is what already happens in the import system. > ``load()`` is not meant to change any of this behavior. > > If ``loader`` is not set (``None``), ``load()`` raises a ValueError. > Since the loader is required by the initializer for ModuleSpec I don't know if this specific check is necessary: EAFP. > > **reload(module)** > > .. container:: > > As with ``load()`` this method faithfully fulfills the semantics of > ``Loader.load_module()`` in the reload case, with one exception: > reloading a module when ``exec_module()`` is available actually uses > ``module`` rather than ignoring it in favor of the one in > ``sys.modules``, as ``Loader.load_module()`` does. The functionality > here mirrors that of ``load()``, minus the ``create()`` call and the > ``sys.modules`` handling. > > .. XXX add more of importlib.reload()'s boilerplate to reload()? > > Omitted Attributes and Methods > ------------------------------ > > There is no ``PathModuleSpec`` subclass of ``ModuleSpec`` that provides > the ``has_location``, ``cached``, and ``submodule_search_locations`` > functionality. While that might make the separation cleaner, module > objects don't have that distinction. ``ModuleSpec`` will support both > cases equally well. > > While ``is_package`` would be a simple additional attribute (aliasing > ``self.submodule_search_locations is not None``), it perpetuates the > artificial (and mostly erroneous) distinction between modules and > packages. > > Conceivably, ``ModuleSpec.load()`` could optionally take a list of > modules with which to interact instead of ``sys.modules``. That > capability is left out of this PEP, but may be pursued separately at > some other time, including relative to PEP 406 (import engine). > > Likewise ``load()`` could be leveraged to implement multi-version > imports. While interesting, doing so is outside the scope of this > proposal. > > Backward Compatibility > ---------------------- > > ``ModuleSpec`` doesn't have any. This would be a different story if > ``Finder.find_module()`` were to return a module spec instead of loader. > In that case, specs would have to act like the loader that would have > been returned instead. Doing so would be relatively simple, but is an > unnecessary complication. > > Subclassing > ----------- > > Subclasses of ModuleSpec are allowed, but should not be necessary. > Simply setting ``loading_info`` or adding functionality to a custom > finder or loader will likely be a better fit and should be tried first. > However, as long as a subclass still fulfills the requirements of the > import system, objects of that type are completely fine as the return > value of ``Finder.find_spec()``. > > > Existing Types > ============== > > Module Objects > -------------- > > **__spec__** > > .. container:: > > Module objects will now have a ``__spec__`` attribute to which the > module's spec will be bound. > > None of the other import-related module attributes will be changed or > deprecated, though some of them could be; any such deprecation can wait > until Python 4. > > ``ModuleSpec`` objects will not be kept in sync with the corresponding > module object's import-related attributes. Though they may differ, in > practice they will typically be the same. > > One notable exception is that case where a module is run as a script by > using the ``-m`` flag. In that case ``module.__spec__.name`` will > reflect the actual module name while ``module.__name__`` will be > ``__main__``. > > Finders > ------- > > **MetaPathFinder.find_spec(name, path=None)** > > **PathEntryFinder.find_spec(name)** > > .. container:: > > Finders will return ModuleSpec objects when ``find_spec()`` is > called. This new method replaces ``find_module()`` and > ``find_loader()`` (in the ``PathEntryFinder`` case). If a loader does > not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are > used instead, for backward-compatibility. > > Adding yet another similar method to loaders is a case of practicality. > ``find_module()`` could be changed to return specs instead of loaders. > This is tempting because the import APIs have suffered enough, > especially considering ``PathEntryFinder.find_loader()`` was just > added in Python 3.3. However, the extra complexity and a less-than- > explicit method name aren't worth it. > > Finders are still responsible for creating the loader. That loader will > now be stored in the module spec returned by ``find_spec()`` rather > than returned directly. As is currently the case without the PEP, if a > loader would be costly to create, that loader can be designed to defer > the cost until later. > > Loaders > ------- > > **Loader.exec_module(module)** > > .. container:: > > Loaders will have a new method, ``exec_module()``. Its only job > is to "exec" the module and consequently populate the module's > namespace. It is not responsible for creating or preparing the module > object, nor for any cleanup afterward. It has no return value. > > **Loader.load_module(fullname)** > > .. container:: > > The ``load_module()`` of loaders will still work and be an active part > of the loader API. It is still useful for cases where the default > module creation/prepartion/cleanup is not appropriate for the loader. > If implemented, ``load_module()`` will still be responsible for its > current requirements (prep/exec/etc.) since the method may be called > directly. > > For example, the C API for extension modules only supports the full > control of ``load_module()``. As such, ``ExtensionFileLoader`` will not > implement ``exec_module()``. In the future it may be appropriate to > produce a second C API that would support an ``exec_module()`` > implementation for ``ExtensionFileLoader``. Such a change is outside > the scope of this PEP. > > A loader must define either ``exec_module()`` or ``load_module()``. If > both exist on the loader, ``ModuleSpec.load()`` uses ``exec_module()`` > and ignores ``load_module()``. > > **Loader.create_module(spec)** > > .. container:: > > Loaders may also implement ``create_module()`` that will return a > new module to exec. However, most loaders will not need to implement > the method. > > PEP 420 introduced the optional ``module_repr()`` loader method to limit > the amount of special-casing in the module type's ``__repr__()``. Since > this method is part of ``ModuleSpec``, it will be deprecated on loaders. > However, if it exists on a loader it will be used exclusively. > > ``Loader.init_module_attr()`` method, added prior to Python 3.4's > release , will be removed in favor of the same method on ``ModuleSpec``. > > However, ``InspectLoader.is_package()`` will not be deprecated even > though the same information is found on ``ModuleSpec``. ``ModuleSpec`` > can use it to populate its own ``is_package`` if that information is > not otherwise available. Still, it will be made optional. > > The path-based loaders in ``importlib`` take arguments in their > ``__init__()`` and have corresponding attributes. However, the need for > those values is eliminated by module specs. The only exception is > ``FileLoader.get_filename()``, which uses ``self.path``. The signatures > for these loaders and the accompanying attributes will be deprecated. > > In addition to executing a module during loading, loaders will still be > directly responsible for providing APIs concerning module-related data. > > > Other Changes > ============= > > * The various finders and loaders provided by ``importlib`` will be > updated to comply with this proposal. > * The spec for the ``__main__`` module will reflect how the interpreter > was started. For instance, with ``-m`` the spec's name will be that > of the run module, while ``__main__.__name__`` will still be > "__main__". > * We add ``importlib.find_spec()`` to mirror > ``importlib.find_loader()`` (which becomes deprecated). > * Deprecations in ``importlib.util``: ``set_package()``, > ``set_loader()``, and ``module_for_loader()``. ``module_to_load()`` > (introduced prior to Python 3.4's release) can be removed. > * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``. > * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of > the per-module import lock, whereas ``Loader.load_module()`` did not. > > > Reference Implementation > ======================== > > A reference implementation will be available at > http://bugs.python.org/issue18864. > > > Open Issues > ============== > > \* The impact of this change on pkgutil (and setuptools) needs looking > into. It has some generic function-based extensions to PEP 302. These > may break if importlib starts wrapping loaders without the tools' > knowledge. > > \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, > inspect. > > \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the > spec's loader? > No. This starts to move this away from ModuleSpec modules being a data storage object and more or a level of indirection around loaders. > > \* How to limit possible end-user confusion/abuses relative to spec > attributes (since __spec__ will make them really accessible)? > > > References > ========== > > [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html > > > Copyright > ========= > > This document has been placed in the public domain. > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Aug 29 03:57:40 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 19:57:40 -0600 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 11:22 AM, Brett Cannon wrote: > On Wed, Aug 28, 2013 at 4:50 AM, Eric Snow wrote: > >> This PEP proposes to add a new class to ``importlib.machinery`` called >> ``ModuleSpec``. It will be authoritative for all the import-related >> information about a module, and will be available without needing to >> load the module first. Finders will provide a module's spec instead of >> a loader. >> > > Don't you mean finders will return a ModuleSpec? Since 'loader' is still > defined in the ModuleSpec to know what loader to use that statement that > finders don't provide a loader is misleading. > Yeah, finders will still create the loaders. I'll make that more clear. importlib.machinery.ModuleSpec (new) >> > ------------------------------------ >> > > For this entire section you need to provide the call signatures as you > start talking semantics later w/o making clear what is being passed and > returned before going into detail of the individual methods. Otherwise move > the detailed discussion of the methods up to before the semantics overview. > I'll try moving the detailed descriptions up. > > >> >> Attributes: >> >> * name - a string for the name of the module. >> * loader - the loader to use for loading and for module data. >> * origin - a string for the location from which the module is loaded. >> > > I would give an "e.g." here to help explain what you mean. As previous > comments have shown, the name alone is not enough to understand what value > should go here. =) > Good point. :) > > >> * submodule_search_locations - strings for where to find submodules, >> if a package. >> * loading_info - a container of data for use during loading (or None). >> * cached (property) - a string for where the compiled module will be >> stored. >> * is_location (RO-property) - the module's origin refers to a location. >> >> .. XXX Find a better name than loading_info? >> > > loading_data is all that I can think of > > >> .. XXX Add ``submodules`` (RO-property) - returns possible submodules >> relative to spec (or None)? >> > > Actual use-case or are you just guessing there will be a use? Don't add > any fields that we have not seen an actual need for. > I was thinking of what Nick said about downplaying the module/package distinction, since a package is just a module with possible submodules. So then I thought, what if there were an easy way to see what submodules a module has available? Non-packages would always have 0 and packages would have 0 or more. I'd use that if it existed. This was mostly a passing idea that would need more thought (the implementation might be tricky). I agree it doesn't need to be in the PEP. > >> .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any? >> > > Too easy to figure out with ``name in sys.modules`` and can go stale > (unless you make this a property). > This was a light attempt at lowering the barrier to entry with regards to the import system. I was thinking of how you have to know to look in sys.modules to see if the module is loaded. Providing a property effectively hides sys.modules as an implementation detail. I was also thinking of this as "installed". Regardless, that I left this as a comment reflects my uncertainty of its utility. > Python users will be able to inspect a module's ``__spec__`` to get >> import-related information about the object. Generally, they will not >> be using the ``ModuleSpec`` factory methods nor the instance methods. >> > > As of right now no one is using the instance methods based on the wording > in this section. =) > Yeah, that would read better as something like "Generally, Python applications and interactive users will not...". > >> However, each spec has methods named ``create``, ``exec``, ``load``, and >> ``reload``. Since they are so easy to access (and misunderstand/abuse), >> their function and availability require explicit consideration in this >> proposal. >> >> >> What Will Existing Finders and Loaders Have to Do Differently? >> ============================================================== >> >> Immediately? Nothing. The status quo will be deprecated, but will >> continue working. However, here are the things that the authors of >> finders and loaders should change relative to this PEP: >> >> * Implement ``find_spec()`` on finders. >> * Implement ``exec_module()`` on loaders, if possible. >> >> The factory methods of ``ModuleSpec`` are intended to be helpful for >> converting existing finders. ``from_loader()`` and >> ``from_file_location()`` are both straight-forward utilities in this >> regard. >> > > If this holds to be true then they should go into importlib.util and kept > out of the general object since dir(module_spec) shouldn't need to show the > methods indefinitely. > I've actually been vacillating for days between classmethods and importlib.util, and at one point I even made a meta class and put the factories there (to keep them out of instances). At the moment importlib.util is making more sense. > > >> In the case where loaders already expose methods for creating >> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on >> a throw-away module to create the appropriate spec. >> > > Why is the module a throw-away one? And why would loaders need to > construct a ModuleSpec? > This is something the Nick pointed out. Some loaders may already have the API to create a module and populate its attributes. In that case, the finder could use that API to get the module and then use ModuleSpec.from_module() to create the spec that find_spec() would return. This is very explicit and direct way to map the existing import-related info for the module onto a spec. > > >> >> As for loaders, >> > > You were just talking about loader, so this is a bad transition. > Yeah, that did get muddled. I was talking about how finders could use the existing capabilities of loaders to build a spec. I'll make that less awkward. > This is an outline of what happens in ``ModuleSpec.load()``. >> >> 1. A new module is created by calling ``spec.create()``. >> >> a. If the loader has a ``create_module()`` method, it gets called. >> Otherwise a new module gets created. >> b. The import-related module attributes are set. >> > > So it seems step (b) happens even if step (a) does. If that's the case > then are attributes overridden blindly, or conditionally set? If (b) > doesn't happen if (a) did then you need to make that clear. > Yeah, (b) always happens. I was planning on having them be overridden blindly to match the spec. Loader.create_module() would not be responsible for setting the import-related attributes. > > >> >> 2. The module is added to sys.modules. >> > > I would add a note that there is a separate method for handling reloads > and thus blindly setting sys.modules is acceptable. > Good point. Furthermore, if the module exists in sys.modules when load() gets called and it fails, the module will be removed from sys.modules. Do you think it would make sense to stick the original module back into sys.modules in that case? I don't because calling load() may have side-effects on the state of that original module. > > >> 3. ``spec.exec(module)`` gets called. >> >> a. If the loader has an ``exec_module()`` method, it gets called. >> Otherwise ``load_module()`` gets called for backward-compatibility >> and the resulting module is updated to match the spec. >> > > "resulting module found in sys.modules is". > > And I think you meant to make step (b) be the fallback to load_module(). > I was thinking of it as one step with two possible paths, rather than 2 steps. > > >> >> 4. If there were any errors the module is removed from sys.modules. >> 5. If the module was replaced in sys.modules during ``exec()``, the one >> in sys.modules is updated to match the spec. >> > > This doesn't make sense. You just said the module got updated to match the > spec in step 3.a. Are you saying you're going to overwrite values that > exec_module() set? And once again, blindly updating or conditionally? And > how are these attributes being set? > As with create_module(), Loader.exec_module() isn't in charge of setting the import-related attributes (as opposed to Loader.load_module(), which is). However, if the module was replaced in sys.modules during exec_module(), then we assume that the one in sys.modules has not been touched by the spec. So we set any of the import-related attributes on it that aren't already set (respecting the ones that are), with the exception of __spec__, which we always override. Thinking about it, it may make sense in that case to create a new spec based on the current one but deferring to any of the existing import-related attributes of the module. Then that spec can be set to __spec__. > Since exec_module() is going to need to set these anyway for proper exec() > use during loading then why are you setting them *again* later on? Should > you set these first and then let the methods reset them as they see fit? I > thought exec_module() took in a filled-in module anyway, so didn't you have > to set all the attributes prior to passing it in anyway in step 1.a? In > that case this is a reset which seems wrong if code explicitly chose to > change the values. > It's not that code chose to change the values. It's that code chose to stick some other object in sys.modules and we're going to return it and we want to be sure the spec and import-related attributes are all properly set. > > >> 6. The module in sys.modules is returned. >> > > Or you can just provide the pseudo-code and skip all of this explanation > and be easier to follow =) You can leave comments with step numbers if you > want to expound upon any specific step outside of the pseudo-code: > That is a really good idea. It will make more sense. > > class ModuleSpec: > > def load(self): > module = self.create() > sys.modules[self.name] = module > > try: > self.exec(module) > except: > try: > del sys.modules[self.name] > except KeyError: > pass > else: > # XXX different from proposal: didn't reset attributes > return sys.modules[self.name] > > def create(self): > if hasattr(self.loader, 'create_module'): > module = self.loader.create_module(self) > else: > module = types.ModuleType(self.name) > # XXX different from proposal: didn't do it blindly after > create_module() > self.init_module_attrs(module) > return module > > def exec(self, module): > if hasattr(self.loader, 'exec_module'): > self.loader.exec_module(module) > elif hasattr(self.loader, 'load_module'): > self.loader.load_module(self.name) > module = sys.modules[self.name] > else: > raise TypeError('{!r} loader does not have an ' + > 'exec_module or load_module > method'.format(self.loader)) > return module > > ... > > >> ========================== =========== >> > On ModuleSpec On Modules >> ========================== =========== >> name __name__ >> loader __loader__ >> package __package__ >> origin __file__* >> cached __cached__* >> > > This shouldn't be set on extension modules, so this is another asterisk of > has_location *and* is not None (right?). > Correct. Good point. > > >> submodule_search_locations __path__** >> loading_info \- >> has_location (RO-property) \- >> ========================== =========== >> >> \* Only if ``is_location`` is true. >> > > Should that be has_location? > Yep. :) > **has_location** >> >> .. container:: >> >> Some modules can be loaded by reference to a location, e.g. a >> filesystem >> path or a URL or something of the sort. Having the location lets you >> load the module, but in theory you could load that module under various >> names. >> >> In contrast, non-located modules can't be loaded in this fashion, e.g. >> builtin modules and modules dynamically created in code. For these, >> the >> name is the only way to access them, so they have an "origin" but not a >> "location". >> >> This attribute reflects whether or not the module is locatable. If it >> is, ``origin`` must be set to the module's location and ``__file__`` >> will be set on the module. Furthermore, a locatable module is also >> cacheable and so ``__cached__`` is tied to ``has_location``. >> > > That statement about __cached__ is not true for extension modules. You're > going to need to tweak how you define 'cached' based on this. Either that > or you can try to use this as a justification for loader.create_module() as > you can override these semantics there as a pure Python module is more > common than extension modules (although this doesn't help with the > ModuleSpec having the wrong information when returned from the finder > unless the finder itself resets it on the ModuleSpec before returning it). > Yeah, I'll need to rework that. > **create()** >> >> .. container:: >> >> A new module is created relative to the spec and its import-related >> attributes are set accordingly. If the spec's loader has a >> ``create_module()`` method, that gets called to create the module. >> This >> give the loader a chance to do any pre-loading initialization that >> can't >> otherwise be accomplished elsewhere. Otherwise a bare module object is >> created. In both cases ``init_module_attrs()`` is called on the module >> before it gets returned. >> > > As stated earlier, I don't like the idea of blindly resetting attributes > if set by create_module(). > Well, create_module() shouldn't be setting them. Are you suggesting that there is a use case for that? > > >> >> **exec(module)** >> >> .. container:: >> >> The spec's loader is used to execute the module. If the loader has >> ``exec_module()`` defined, the namespace of ``module`` is the target of >> execution. >> > > Wait, what? You suggest it's the module in the signature but > module.__dict__ in the explanation. > That's right. The module is passed in and then exec_module() does its thing with module.__dict__. Do you think exec_module() should directly take a dict instead? > >> Otherwise the loader's ``load_module()`` is called, which >> ignores ``module`` and returns the module that was the actual >> execution target. >> > > Are you pulling from sys.modules? Otherwise how are you getting the module > from load_module()? > load_module() returns the module. For loaders that don't follow that rule (and return None), we'll grab the module out of sys.modules. > And you don't mention that in one case the module is not put into > sys.modules while in the other case it is (exec_module vs. load_module). > That dichotomy is going to be messy. > That difference should definitely be clear. What is messy about it? > Does this need to be separate from load()? If you merge it in then the > sys.modules semantics are unified within load(). Otherwise you need to make > this set sys.modules in either case and return from sys.modules. > Both load() and reload() call exec(). In the case of load(), it wraps the exec() call with the requisite sys.modules handling. In the case of reload(), the module should already be in sys.modules. Regardless, Loader.load_module() is already required to do all the sys.modules handling so that base should be covered. If we deprecate that requirement, which we could, then we have a different story. If someone calls exec() directly before a module is ever loaded then the sys.modules handling shouldn't matter anyway; and if someone does that it means the spec did not come from module.__spec__ so they probably aren't a casual user. > >> In that case the import-related attributes of that >> module are updated to reflect the spec. >> > > Why? If you already set the attributes in the module and inserted it into > sys.modules previously then you already took care of this. Else you now are > setting the attributes potentially *three* times (twice in create() from > loader.create_module() + an explicit call to init_module_attr() and then > here). > Loader.load_module() is still responsible for setting those attributes. However, it may have missed one or more (including __spec__). We want to make sure all the appropriate import-related attributes get set. Furthermore, load_module() may have overridden the values we set previously. Given its authority, it may make sense to update module.__spec__ to reflect the attributes set by the loader. That way __spec__ indicates the values used during loading. On the other hand, by not updating the spec, the difference between the module attributes and the spec will reflect the ways in which the loader did not follow the spec. I've been following that former line of thinking, but now I'm wondering if the latter would be better. Regardless, the pathological case where the module attributes set by load_module() and the spec don't match should be pretty rare. As to setting it multiple times, in the worst case the attributes will be set twice. Loader.create_module() shouldn't be setting them. > >> In both cases the targeted >> module is the one that gets returned. >> > > Huh? What exactly are you returning? You say "actual execution target" > above for load_module() but "in both cases the target module" here. That > seems to contradictory. > In the load_module() case we return the result of calling load_module() (or the module in sys.modules if load_module() returns None). Otherwise we return the module that was passed in. In their respective cases both are the actual execution targets. I'll reword that. > > >> >> **load()** >> >> .. container:: >> >> This method captures the current functionality of and requirements on >> ``Loader.load_module()`` without any semantic changes. It is >> essentially a wrapper around ``create()`` and ``exec()`` with some >> extra functionality regarding ``sys.modules``. >> >> itself in ``sys.modules`` while executing. Consequently, the module in >> ``sys.modules`` is the one that gets returned by ``load()``. >> >> Right before ``exec()`` is called, the module is added to >> ``sys.modules``. In the case of error during loading the module is >> removed from ``sys.modules``. The module in ``sys.modules`` when >> ``load()`` finishes is the one that gets returned. Returning the >> module >> from ``sys.modules`` accommodates the ability of the module to replace >> itself there while it is executing (during load). >> >> As already noted, this is what already happens in the import system. >> ``load()`` is not meant to change any of this behavior. >> >> If ``loader`` is not set (``None``), ``load()`` raises a ValueError. >> > > Since the loader is required by the initializer for ModuleSpec I don't > know if this specific check is necessary: EAFP. > Yeah, the check will almost always pass. And if someone does something they shouldn't they'll get an AttributeError really quickly anyway. > Open Issues >> ============== >> >> \* The impact of this change on pkgutil (and setuptools) needs looking >> into. It has some generic function-based extensions to PEP 302. These >> may break if importlib starts wrapping loaders without the tools' >> knowledge. >> >> \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, >> inspect. >> >> \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the >> spec's loader? >> > > No. This starts to move this away from ModuleSpec modules being a data > storage object and more or a level of indirection around loaders. > Agreed. It may be nice to have the easier access to the loader data APIs, but doesn't quite fit. ModuleSpec.data as a wrapper was the best I could think of. > > >> >> \* How to limit possible end-user confusion/abuses relative to spec >> attributes (since __spec__ will make them really accessible)? >> >> >> References >> ========== >> >> [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html >> >> >> Copyright >> ========= >> >> This document has been placed in the public domain. >> >> .. >> Local Variables: >> mode: indented-text >> indent-tabs-mode: nil >> sentence-end-double-space: t >> fill-column: 70 >> coding: utf-8 >> End: >> >> >> _______________________________________________ >> Import-SIG mailing list >> Import-SIG at python.org >> http://mail.python.org/mailman/listinfo/import-sig >> >> > Thanks for that review. It helps a lot. I'll update the PEP when I get a chance. From your feedback I've gathered a few things: 1. The PEP needs to be more clear on what the Loader methods (both existing and new) are supposed to accomplish and their responsibilities regarding sys.modules and module attributes. 2. I need to slide more toward "do less" than I have been in the balance between keeping things simple and covering all the corner cases. 3. Whether or not the spec should be updated to reflect the attributes set by load_module() still needs some consideration. 4. The purpose of exec() vs. load() needs some clarification (pseudo-code should help). 5. I need more sleep. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Aug 29 04:16:22 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 28 Aug 2013 20:16:22 -0600 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 4:10 PM, Nick Coghlan wrote: > Proper review on the weekend, but quickish comments for now: > > - update looks really good, and solves several issues with the original > proposal. Big +1 for keeping it simple and adding a new finder method :) > > - for extension modules that don't define a creation hook (which I won't > be able to figure out before create_module is called), I'd like to be able > to return NotImplemented from create_module to say "please give me a normal > module, or re-use the existing one for reloading". > I was wondering about this. In fact, my current implementation has create_module() return None as the sentinel. Would you prefer NotImplemented? Either way I'll add that to the PEP. I was surprised it wasn't there already. > - I'd like to finally make "can reload or not" explicit in the loader API. > My current idea for this is to add a "reloading" parameter to > create_module, where we pass in the module to be reloaded. Loaders that > support reloading *must* either not define create_module, or, if they > define it, return NotImplemented or return the passed in module in that > case. If it returns a new module, the reload should fail. This shouldn't > break backwards compatibility, as init based extension modules are cached > internally, while modules that use the new hooks can decide for themselves > whether or not to support reloading > I'm unclear on how create_module() is involved during reload. Perhaps the name has thrown me off, because I understood it as something that happens only during ModuleSpec.create() and consequently during load(). Isn't the point to give the loader a chance to do some internal initialization before the module gets loaded? I sense that you have a different idea of it. It does make some sense for it to take a module rather than being responsible for creating a new one. That was probably my own misunderstanding. However, in that case perhaps create_module() isn't the right name. And I'm just not seeing how create_module() relates to reload(). Regarding the "can reload" point, I was thinking it would simply raise an ImportError if it can't reload. Passing in a flag to indicate that the current call is for a reload makes sense, but I was thinking it would go on exec_module() rather than create_module(). Of course, I was also thinking that create_module() wasn't called during reload(). > - I need to check the other proposed reload changes for backwards > compatibility issues (I'm not sure we can ignore changes made to > sys.modules in that case) > I'm glad you brought this up. I keep wondering if ModuleSpec.reload() should assume more of the boilerplate that importlib.reload() has, including returning whatever is in sys.modules. > - frozen modules should have a special origin string, too > Correct. It will be "frozen", which matches the current repr. > - my preferred bikeshed colours are "loader_state" or "loader_info" > > Cheers, > Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 29 11:43:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 29 Aug 2013 19:43:19 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: Just a general comment: spec and module mismatches have to be considered *normal*, and the module has to take precedence. Otherwise you risk breaking too much code. The spec is *not* authoritative in general. The only exception is that when __name__ ends with "__main__", pickle should be updated to look at the name on the spec instead. Even the attribute setting after create_module should respect custom settings provided by the loader. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Aug 29 15:21:42 2013 From: brett at python.org (Brett Cannon) Date: Thu, 29 Aug 2013 09:21:42 -0400 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Thu, Aug 29, 2013 at 5:43 AM, Nick Coghlan wrote: > Just a general comment: spec and module mismatches have to be considered > *normal*, and the module has to take precedence. Otherwise you risk > breaking too much code. > > The spec is *not* authoritative in general. The only exception is that > when __name__ ends with "__main__", pickle should be updated to look at the > name on the spec instead. > > Even the attribute setting after create_module should respect custom > settings provided by the loader. > I agree with this. Modules that change values during their importation/execution will be setting __name__, __package__, etc., not __spec__.__name__. And pre-existing code is going to change those values, so __spec__ is definitely not definitive. As for mutating __spec__ to match other data after execution, I don't think that's needed. __spec__ should be viewed as a way to record what import worked with to make the import happen, but otherwise it's there because memory is cheap thanks to most systems not having many modules and it's better to log stuff unnecessarily than potentially toss out useful information. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 29 16:00:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 30 Aug 2013 00:00:35 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 29 Aug 2013 12:16, "Eric Snow" wrote: > > On Wed, Aug 28, 2013 at 4:10 PM, Nick Coghlan wrote: >> >> Proper review on the weekend, but quickish comments for now: >> >> - update looks really good, and solves several issues with the original proposal. Big +1 for keeping it simple and adding a new finder method :) >> >> - for extension modules that don't define a creation hook (which I won't be able to figure out before create_module is called), I'd like to be able to return NotImplemented from create_module to say "please give me a normal module, or re-use the existing one for reloading". > > I was wondering about this. In fact, my current implementation has create_module() return None as the sentinel. Would you prefer NotImplemented? Either way I'll add that to the PEP. I was surprised it wasn't there already. I agree None is a better sentinel for this use case. Just so long as there is one :) >> - I'd like to finally make "can reload or not" explicit in the loader API. My current idea for this is to add a "reloading" parameter to create_module, where we pass in the module to be reloaded. Loaders that support reloading *must* either not define create_module, or, if they define it, return NotImplemented or return the passed in module in that case. If it returns a new module, the reload should fail. This shouldn't break backwards compatibility, as init based extension modules are cached internally, while modules that use the new hooks can decide for themselves whether or not to support reloading > > I'm unclear on how create_module() is involved during reload. Perhaps the name has thrown me off, because I understood it as something that happens only during ModuleSpec.create() and consequently during load(). Isn't the point to give the loader a chance to do some internal initialization before the module gets loaded? If a loader defines create_module, then it is probably the case that we *shouldn't* allow reloading. Reloading is all about mutating the existing object in place so that existing references see the changes. If the loader wants to create a new module every time, that's not going to be possible, and attempting to reload it should trigger an exception rather than silently misbehaving (or, worse, crashing the interpreter if a C extension gets confused). However, if it returns the same module (as, say, the existing extension module API would do), then we can go ahead and rerun exec, knowing that people will see the change. I agree a "is_reload" flag to exec_module, or a separate can_reload hook would be clearer, though. Basically, I'd like to get us to a point where attempting to reload an extension module will either work as well as it does for a Python module, or will fail with a clear exception. >> >> - I need to check the other proposed reload changes for backwards compatibility issues (I'm not sure we can ignore changes made to sys.modules in that case) > > I'm glad you brought this up. I keep wondering if ModuleSpec.reload() should assume more of the boilerplate that importlib.reload() has, including returning whatever is in sys.modules. That's the main proposed change that bothered me :) Cheers, Nick. From ncoghlan at gmail.com Thu Aug 29 16:02:37 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 30 Aug 2013 00:02:37 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 29 August 2013 11:57, Eric Snow wrote: > On Wed, Aug 28, 2013 at 11:22 AM, Brett Cannon wrote: >> Wait, what? You suggest it's the module in the signature but >> module.__dict__ in the explanation. > > That's right. The module is passed in and then exec_module() does its thing > with module.__dict__. Do you think exec_module() should directly take a > dict instead? Has to be the module - create_module may have returned a custom type. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Thu Aug 29 15:16:54 2013 From: brett at python.org (Brett Cannon) Date: Thu, 29 Aug 2013 09:16:54 -0400 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Wed, Aug 28, 2013 at 9:57 PM, Eric Snow wrote: > On Wed, Aug 28, 2013 at 11:22 AM, Brett Cannon wrote: > >> On Wed, Aug 28, 2013 at 4:50 AM, Eric Snow wrote: >> >>> This PEP proposes to add a new class to ``importlib.machinery`` called >>> ``ModuleSpec``. It will be authoritative for all the import-related >>> information about a module, and will be available without needing to >>> load the module first. Finders will provide a module's spec instead of >>> a loader. >>> >> >> Don't you mean finders will return a ModuleSpec? Since 'loader' is still >> defined in the ModuleSpec to know what loader to use that statement that >> finders don't provide a loader is misleading. >> > > Yeah, finders will still create the loaders. I'll make that more clear. > > importlib.machinery.ModuleSpec (new) >>> >> ------------------------------------ >>> >> >> For this entire section you need to provide the call signatures as you >> start talking semantics later w/o making clear what is being passed and >> returned before going into detail of the individual methods. Otherwise move >> the detailed discussion of the methods up to before the semantics overview. >> > > I'll try moving the detailed descriptions up. > > >> >> >>> >>> Attributes: >>> >>> * name - a string for the name of the module. >>> * loader - the loader to use for loading and for module data. >>> * origin - a string for the location from which the module is loaded. >>> >> >> I would give an "e.g." here to help explain what you mean. As previous >> comments have shown, the name alone is not enough to understand what value >> should go here. =) >> > > Good point. :) > > >> >> >>> * submodule_search_locations - strings for where to find submodules, >>> if a package. >>> * loading_info - a container of data for use during loading (or None). >>> * cached (property) - a string for where the compiled module will be >>> stored. >>> * is_location (RO-property) - the module's origin refers to a location. >>> >>> .. XXX Find a better name than loading_info? >>> >> >> loading_data is all that I can think of >> >> >>> .. XXX Add ``submodules`` (RO-property) - returns possible submodules >>> relative to spec (or None)? >>> >> >> Actual use-case or are you just guessing there will be a use? Don't add >> any fields that we have not seen an actual need for. >> > > I was thinking of what Nick said about downplaying the module/package > distinction, since a package is just a module with possible submodules. So > then I thought, what if there were an easy way to see what submodules a > module has available? Non-packages would always have 0 and packages would > have 0 or more. I'd use that if it existed. > > This was mostly a passing idea that would need more thought (the > implementation might be tricky). I agree it doesn't need to be in the PEP. > > >> >>> .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any? >>> >> >> Too easy to figure out with ``name in sys.modules`` and can go stale >> (unless you make this a property). >> > > This was a light attempt at lowering the barrier to entry with regards to > the import system. I was thinking of how you have to know to look in > sys.modules to see if the module is loaded. Providing a property > effectively hides sys.modules as an implementation detail. I was also > thinking of this as "installed". > > Regardless, that I left this as a comment reflects my uncertainty of its > utility. > There is lowering the barrier of entry and then there is adding a needless API. While I appreciate wanting to make the import system more accessible, you can't paint over sys.modules entirely, so trying to partially hide it here won't do anyone any good when they have to deal with it in other places (at least if they are working at the API level of needing to care if something is loaded). > > >> Python users will be able to inspect a module's ``__spec__`` to get >>> import-related information about the object. Generally, they will not >>> be using the ``ModuleSpec`` factory methods nor the instance methods. >>> >> >> As of right now no one is using the instance methods based on the wording >> in this section. =) >> > > Yeah, that would read better as something like "Generally, Python > applications and interactive users will not...". > > >> >>> However, each spec has methods named ``create``, ``exec``, ``load``, and >>> ``reload``. Since they are so easy to access (and misunderstand/abuse), >>> their function and availability require explicit consideration in this >>> proposal. >>> >>> >>> What Will Existing Finders and Loaders Have to Do Differently? >>> ============================================================== >>> >>> Immediately? Nothing. The status quo will be deprecated, but will >>> continue working. However, here are the things that the authors of >>> finders and loaders should change relative to this PEP: >>> >>> * Implement ``find_spec()`` on finders. >>> * Implement ``exec_module()`` on loaders, if possible. >>> >>> The factory methods of ``ModuleSpec`` are intended to be helpful for >>> converting existing finders. ``from_loader()`` and >>> ``from_file_location()`` are both straight-forward utilities in this >>> regard. >>> >> >> If this holds to be true then they should go into importlib.util and kept >> out of the general object since dir(module_spec) shouldn't need to show the >> methods indefinitely. >> > > I've actually been vacillating for days between classmethods and > importlib.util, and at one point I even made a meta class and put the > factories there (to keep them out of instances). At the moment > importlib.util is making more sense. > > >> >> >>> In the case where loaders already expose methods for creating >>> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on >>> a throw-away module to create the appropriate spec. >>> >> >> Why is the module a throw-away one? And why would loaders need to >> construct a ModuleSpec? >> > > This is something the Nick pointed out. Some loaders may already have the > API to create a module and populate its attributes. > The key word here is "may". You simply cannot guess at needs of users without explicit evidence this is actually actively true in the wild. Even using importlib as an example is iffy since that's mostly my opinion of how to do things and won't necessarily hold (e.g. the namespace classes Eric wrote don't check their values at all while all the classes I wrote verify their name). Trust me, you don't want to end up supporting an API that only one person uses. Keep this initial API small and to the point and expand it as necessary or as requests come in for future releases. While we should aim to get the core concepts right the first time, we can expand the API later as necessary. There will be more Python releases after all. =) > In that case, the finder could use that API to get the module and then > use ModuleSpec.from_module() to create the spec that find_spec() would > return. This is very explicit and direct way to map the existing > import-related info for the module onto a spec. > > >> >> >>> >>> As for loaders, >>> >> >> You were just talking about loader, so this is a bad transition. >> > > Yeah, that did get muddled. I was talking about how finders could use the > existing capabilities of loaders to build a spec. I'll make that less > awkward. > > >> This is an outline of what happens in ``ModuleSpec.load()``. >>> >>> 1. A new module is created by calling ``spec.create()``. >>> >>> a. If the loader has a ``create_module()`` method, it gets called. >>> Otherwise a new module gets created. >>> b. The import-related module attributes are set. >>> >> >> So it seems step (b) happens even if step (a) does. If that's the case >> then are attributes overridden blindly, or conditionally set? If (b) >> doesn't happen if (a) did then you need to make that clear. >> > > Yeah, (b) always happens. I was planning on having them be overridden > blindly to match the spec. Loader.create_module() would not be responsible > for setting the import-related attributes. > > >> >> >>> >>> 2. The module is added to sys.modules. >>> >> >> I would add a note that there is a separate method for handling reloads >> and thus blindly setting sys.modules is acceptable. >> > > Good point. Furthermore, if the module exists in sys.modules when load() > gets called and it fails, the module will be removed from sys.modules. Do > you think it would make sense to stick the original module back into > sys.modules in that case? I don't because calling load() may have > side-effects on the state of that original module. > Fine by me. Does document pre-existing values in sys.modules are *not* taken into consideration; if you want that then use reload(). > > >> >> >>> 3. ``spec.exec(module)`` gets called. >>> >>> a. If the loader has an ``exec_module()`` method, it gets called. >>> Otherwise ``load_module()`` gets called for backward-compatibility >>> and the resulting module is updated to match the spec. >>> >> >> "resulting module found in sys.modules is". >> >> And I think you meant to make step (b) be the fallback to load_module(). >> > > I was thinking of it as one step with two possible paths, rather than 2 > steps. > Then it's just step 3 since there is no (b) step. > > >> >> >>> >>> 4. If there were any errors the module is removed from sys.modules. >>> 5. If the module was replaced in sys.modules during ``exec()``, the one >>> in sys.modules is updated to match the spec. >>> >> >> This doesn't make sense. You just said the module got updated to match >> the spec in step 3.a. Are you saying you're going to overwrite values that >> exec_module() set? And once again, blindly updating or conditionally? And >> how are these attributes being set? >> > > As with create_module(), Loader.exec_module() isn't in charge of setting > the import-related attributes (as opposed to Loader.load_module(), which > is). However, if the module was replaced in sys.modules during > exec_module(), then we assume that the one in sys.modules has not been > touched by the spec. So we set any of the import-related attributes on it > that aren't already set (respecting the ones that are), with the exception > of __spec__, which we always override. > > Thinking about it, it may make sense in that case to create a new spec > based on the current one but deferring to any of the existing > import-related attributes of the module. Then that spec can be set to > __spec__. > If you do that then you are starting to shift to either a loader method or a function somewhere in importlib, else you are creating a conditional factory for specs. > > >> Since exec_module() is going to need to set these anyway for proper >> exec() use during loading then why are you setting them *again* later on? >> Should you set these first and then let the methods reset them as they see >> fit? I thought exec_module() took in a filled-in module anyway, so didn't >> you have to set all the attributes prior to passing it in anyway in step >> 1.a? In that case this is a reset which seems wrong if code explicitly >> chose to change the values. >> > > It's not that code chose to change the values. It's that code chose to > stick some other object in sys.modules and we're going to return it and we > want to be sure the spec and import-related attributes are all properly set. > Do we though? We do that currently except for __package__ and __loader__, but that's for backwards-compatibility since those attributes are so new. This also destroys the possibility of lazy loading, btw (which I unfortunately realized after 3.3 was released, so I kind of regret it). Since the attributes have to be set before any exec() call I understand making sure they are set at that point, but if you're going to shove some other object in sys.modules then I view it as your problem to make sure you set those attributes as you are already deviating from normal import semantics. IOW if you are given a module you can and should expect it is as prepared as the import system can make it to be executed on, but if you choose to ignore it then you are on your own. > > >> >> >>> 6. The module in sys.modules is returned. >>> >> >> Or you can just provide the pseudo-code and skip all of this explanation >> and be easier to follow =) You can leave comments with step numbers if you >> want to expound upon any specific step outside of the pseudo-code: >> > > That is a really good idea. It will make more sense. > I know it will help me follow exactly what you are planning to do. It isn't like these methods are that complicated or long. > > >> >> class ModuleSpec: >> >> def load(self): >> module = self.create() >> sys.modules[self.name] = module >> >> try: >> self.exec(module) >> except: >> try: >> del sys.modules[self.name] >> except KeyError: >> pass >> else: >> # XXX different from proposal: didn't reset attributes >> return sys.modules[self.name] >> >> def create(self): >> if hasattr(self.loader, 'create_module'): >> module = self.loader.create_module(self) >> else: >> module = types.ModuleType(self.name) >> # XXX different from proposal: didn't do it blindly after >> create_module() >> self.init_module_attrs(module) >> return module >> >> def exec(self, module): >> if hasattr(self.loader, 'exec_module'): >> self.loader.exec_module(module) >> elif hasattr(self.loader, 'load_module'): >> self.loader.load_module(self.name) >> module = sys.modules[self.name] >> else: >> raise TypeError('{!r} loader does not have an ' + >> 'exec_module or load_module >> method'.format(self.loader)) >> return module >> >> ... >> >> >>> ========================== =========== >>> >> On ModuleSpec On Modules >>> ========================== =========== >>> name __name__ >>> loader __loader__ >>> package __package__ >>> origin __file__* >>> cached __cached__* >>> >> >> This shouldn't be set on extension modules, so this is another asterisk >> of has_location *and* is not None (right?). >> > > Correct. Good point. > > >> >> >>> submodule_search_locations __path__** >>> loading_info \- >>> has_location (RO-property) \- >>> ========================== =========== >>> >>> \* Only if ``is_location`` is true. >>> >> >> Should that be has_location? >> > > Yep. :) > > >> **has_location** >>> >>> .. container:: >>> >>> Some modules can be loaded by reference to a location, e.g. a >>> filesystem >>> path or a URL or something of the sort. Having the location lets you >>> load the module, but in theory you could load that module under >>> various >>> names. >>> >>> In contrast, non-located modules can't be loaded in this fashion, e.g. >>> builtin modules and modules dynamically created in code. For these, >>> the >>> name is the only way to access them, so they have an "origin" but not >>> a >>> "location". >>> >>> This attribute reflects whether or not the module is locatable. If it >>> is, ``origin`` must be set to the module's location and ``__file__`` >>> will be set on the module. Furthermore, a locatable module is also >>> cacheable and so ``__cached__`` is tied to ``has_location``. >>> >> >> That statement about __cached__ is not true for extension modules. You're >> going to need to tweak how you define 'cached' based on this. Either that >> or you can try to use this as a justification for loader.create_module() as >> you can override these semantics there as a pure Python module is more >> common than extension modules (although this doesn't help with the >> ModuleSpec having the wrong information when returned from the finder >> unless the finder itself resets it on the ModuleSpec before returning it). >> > > Yeah, I'll need to rework that. > > >> **create()** >>> >>> .. container:: >>> >>> A new module is created relative to the spec and its import-related >>> attributes are set accordingly. If the spec's loader has a >>> ``create_module()`` method, that gets called to create the module. >>> This >>> give the loader a chance to do any pre-loading initialization that >>> can't >>> otherwise be accomplished elsewhere. Otherwise a bare module object >>> is >>> created. In both cases ``init_module_attrs()`` is called on the >>> module >>> before it gets returned. >>> >> >> As stated earlier, I don't like the idea of blindly resetting attributes >> if set by create_module(). >> > > Well, create_module() shouldn't be setting them. Are you suggesting that > there is a use case for that? > I'm saying you need to very clearly state whose job it is to set all of the attributes on the module and whether they are blindly set or conditionally and based on what test to know if they are already set (e.g. None vs. the attribute not existing). > > >> >> >>> >>> **exec(module)** >>> >>> .. container:: >>> >>> The spec's loader is used to execute the module. If the loader has >>> ``exec_module()`` defined, the namespace of ``module`` is the target >>> of >>> execution. >>> >> >> Wait, what? You suggest it's the module in the signature but >> module.__dict__ in the explanation. >> > > That's right. The module is passed in and then exec_module() does its > thing with module.__dict__. Do you think exec_module() should directly > take a dict instead? > No, not at all. The wording just wasn't clear to me as I took "target" to mean "what I passed in", not what the point of the functionality was. I would reword it to say "If the loader has ``exec_module()`` defined then it is expected to execute the module's code on the passed-in module object" or something. > > >> >>> Otherwise the loader's ``load_module()`` is called, which >>> ignores ``module`` and returns the module that was the actual >>> execution target. >>> >> >> Are you pulling from sys.modules? Otherwise how are you getting the >> module from load_module()? >> > > load_module() returns the module. For loaders that don't follow that rule > (and return None), we'll grab the module out of sys.modules. > > >> And you don't mention that in one case the module is not put into >> sys.modules while in the other case it is (exec_module vs. load_module). >> That dichotomy is going to be messy. >> > > That difference should definitely be clear. What is messy about it? > You've abstracted out what method is called, which should mean I don't have to care which method is used (in case both are defined by a very compatible loader). But in this case I do are as one will cache its results and the other won't. Since sys.modules is such a core data structure you should always know when it has been tampered with, but these semantics make it unclear based on the the arguments used as to what the result will be in regards to implicit side-effects. I'm going to argue that you should have unified side-effects in regards to sys.modules so exec_module() should set sys.modules if it was not done so by the loader. If you want you can add a keyword-only argument to make that "optional" (i.e. explicitly remove from sys.modules or just don't explicit set it but whatever the loader did it did) or vice-versa, but there should be some way for me to be able to say "I don't want any surprises in regards to sys.modules, so be consistent regardless of loader method called". > > >> Does this need to be separate from load()? If you merge it in then the >> sys.modules semantics are unified within load(). Otherwise you need to make >> this set sys.modules in either case and return from sys.modules. >> > > Both load() and reload() call exec(). In the case of load(), it wraps the > exec() call with the requisite sys.modules handling. In the case of > reload(), the module should already be in sys.modules. Regardless, > Loader.load_module() is already required to do all the sys.modules handling > so that base should be covered. If we deprecate that requirement, which we > could, then we have a different story. If someone calls exec() directly > before a module is ever loaded then the sys.modules handling shouldn't > matter anyway; > I disagree: it should and will matter as side-effects of setting sys.modules influence import. > and if someone does that it means the spec did not come from > module.__spec__ so they probably aren't a casual user. > > >> >>> In that case the import-related attributes of that >>> module are updated to reflect the spec. >>> >> >> Why? If you already set the attributes in the module and inserted it into >> sys.modules previously then you already took care of this. Else you now are >> setting the attributes potentially *three* times (twice in create() from >> loader.create_module() + an explicit call to init_module_attr() and then >> here). >> > > Loader.load_module() is still responsible for setting those attributes. > However, it may have missed one or more (including __spec__). We want to > make sure all the appropriate import-related attributes get set. > Furthermore, load_module() may have overridden the values we set > previously. > > Given its authority, it may make sense to update module.__spec__ to > reflect the attributes set by the loader. That way __spec__ indicates the > values used during loading. On the other hand, by not updating the spec, > the difference between the module attributes and the spec will reflect the > ways in which the loader did not follow the spec. I've been following that > former line of thinking, but now I'm wondering if the latter would be > better. Regardless, the pathological case where the module attributes set > by load_module() and the spec don't match should be pretty rare. > > As to setting it multiple times, in the worst case the attributes will be > set twice. Loader.create_module() shouldn't be setting them. > > >> >>> In both cases the targeted >>> module is the one that gets returned. >>> >> >> Huh? What exactly are you returning? You say "actual execution target" >> above for load_module() but "in both cases the target module" here. That >> seems to contradictory. >> > > In the load_module() case we return the result of calling load_module() > (or the module in sys.modules if load_module() returns None). Otherwise we > return the module that was passed in. In their respective cases both are > the actual execution targets. > > I'll reword that. > > >> >> >>> >>> **load()** >>> >>> .. container:: >>> >>> This method captures the current functionality of and requirements on >>> ``Loader.load_module()`` without any semantic changes. It is >>> essentially a wrapper around ``create()`` and ``exec()`` with some >>> extra functionality regarding ``sys.modules``. >>> >>> itself in ``sys.modules`` while executing. Consequently, the module >>> in >>> ``sys.modules`` is the one that gets returned by ``load()``. >>> >>> Right before ``exec()`` is called, the module is added to >>> ``sys.modules``. In the case of error during loading the module is >>> removed from ``sys.modules``. The module in ``sys.modules`` when >>> ``load()`` finishes is the one that gets returned. Returning the >>> module >>> from ``sys.modules`` accommodates the ability of the module to replace >>> itself there while it is executing (during load). >>> >>> As already noted, this is what already happens in the import system. >>> ``load()`` is not meant to change any of this behavior. >>> >>> If ``loader`` is not set (``None``), ``load()`` raises a ValueError. >>> >> >> Since the loader is required by the initializer for ModuleSpec I don't >> know if this specific check is necessary: EAFP. >> > > Yeah, the check will almost always pass. And if someone does something > they shouldn't they'll get an AttributeError really quickly anyway. > > >> Open Issues >>> ============== >>> >>> \* The impact of this change on pkgutil (and setuptools) needs looking >>> into. It has some generic function-based extensions to PEP 302. These >>> may break if importlib starts wrapping loaders without the tools' >>> knowledge. >>> >>> \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, >>> inspect. >>> >>> \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the >>> spec's loader? >>> >> >> No. This starts to move this away from ModuleSpec modules being a data >> storage object and more or a level of indirection around loaders. >> > > Agreed. It may be nice to have the easier access to the loader data APIs, > but doesn't quite fit. ModuleSpec.data as a wrapper was the best I could > think of. > > >> >> >>> >>> \* How to limit possible end-user confusion/abuses relative to spec >>> attributes (since __spec__ will make them really accessible)? >>> >>> >>> References >>> ========== >>> >>> [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html >>> >>> >>> Copyright >>> ========= >>> >>> This document has been placed in the public domain. >>> >>> .. >>> Local Variables: >>> mode: indented-text >>> indent-tabs-mode: nil >>> sentence-end-double-space: t >>> fill-column: 70 >>> coding: utf-8 >>> End: >>> >>> >>> _______________________________________________ >>> Import-SIG mailing list >>> Import-SIG at python.org >>> http://mail.python.org/mailman/listinfo/import-sig >>> >>> >> > Thanks for that review. It helps a lot. I'll update the PEP when I get a > chance. From your feedback I've gathered a few things: > > 1. The PEP needs to be more clear on what the Loader methods (both > existing and new) are supposed to accomplish and their responsibilities > regarding sys.modules and module attributes. > 2. I need to slide more toward "do less" than I have been in the balance > between keeping things simple and covering all the corner cases. > Yeah, if you miss something people will file a bug or speak up. =) Remember, any API that goes in will theoretically need to be supported for a **long** time. Expanding an API is much easier than contracting it. > 3. Whether or not the spec should be updated to reflect the attributes set > by load_module() still needs some consideration. > 4. The purpose of exec() vs. load() needs some clarification (pseudo-code > should help). > 5. I need more sleep. :) > Ditto. =) If I at all came off as cranky it's because two days in a row I needed to get up at 04:50. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 30 16:41:03 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 31 Aug 2013 00:41:03 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 29 August 2013 23:16, Brett Cannon wrote: >> Regardless, that I left this as a comment reflects my uncertainty of its >> utility. > > > There is lowering the barrier of entry and then there is adding a needless > API. While I appreciate wanting to make the import system more accessible, > you can't paint over sys.modules entirely, so trying to partially hide it > here won't do anyone any good when they have to deal with it in other places > (at least if they are working at the API level of needing to care if > something is loaded). Agreed on this one. >>>> In the case where loaders already expose methods for creating >>>> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on >>>> a throw-away module to create the appropriate spec. >>> >>> >>> Why is the module a throw-away one? And why would loaders need to >>> construct a ModuleSpec? >> >> >> This is something the Nick pointed out. Some loaders may already have the >> API to create a module and populate its attributes. > > > The key word here is "may". You simply cannot guess at needs of users > without explicit evidence this is actually actively true in the wild. Even > using importlib as an example is iffy since that's mostly my opinion of how > to do things and won't necessarily hold (e.g. the namespace classes Eric > wrote don't check their values at all while all the classes I wrote verify > their name). > > Trust me, you don't want to end up supporting an API that only one person > uses. Keep this initial API small and to the point and expand it as > necessary or as requests come in for future releases. While we should aim to > get the core concepts right the first time, we can expand the API later as > necessary. There will be more Python releases after all. =) Yeah, I had my sequence of events wrong when I suggested this one (I was thinking about the create/exec split when loading a module). Existing finders deal in loaders, not already constructed modules, so this doesn't actually make sense. Let's make it go away :) >> As with create_module(), Loader.exec_module() isn't in charge of setting >> the import-related attributes (as opposed to Loader.load_module(), which >> is). However, if the module was replaced in sys.modules during >> exec_module(), then we assume that the one in sys.modules has not been >> touched by the spec. So we set any of the import-related attributes on it >> that aren't already set (respecting the ones that are), with the exception >> of __spec__, which we always override. >> >> Thinking about it, it may make sense in that case to create a new spec >> based on the current one but deferring to any of the existing >> import-related attributes of the module. Then that spec can be set to >> __spec__. > > > If you do that then you are starting to shift to either a loader method or a > function somewhere in importlib, else you are creating a conditional factory > for specs. I still like the idea of keeping __spec__ as "this is how we found it originally", with only the module attributes reflecting any dynamic updates. >>> Since exec_module() is going to need to set these anyway for proper >>> exec() use during loading then why are you setting them *again* later on? >>> Should you set these first and then let the methods reset them as they see >>> fit? I thought exec_module() took in a filled-in module anyway, so didn't >>> you have to set all the attributes prior to passing it in anyway in step >>> 1.a? In that case this is a reset which seems wrong if code explicitly chose >>> to change the values. >> >> >> It's not that code chose to change the values. It's that code chose to >> stick some other object in sys.modules and we're going to return it and we >> want to be sure the spec and import-related attributes are all properly set. > > > Do we though? We do that currently except for __package__ and __loader__, > but that's for backwards-compatibility since those attributes are so new. > This also destroys the possibility of lazy loading, btw (which I > unfortunately realized after 3.3 was released, so I kind of regret it). > > Since the attributes have to be set before any exec() call I understand > making sure they are set at that point, but if you're going to shove some > other object in sys.modules then I view it as your problem to make sure you > set those attributes as you are already deviating from normal import > semantics. IOW if you are given a module you can and should expect it is as > prepared as the import system can make it to be executed on, but if you > choose to ignore it then you are on your own. +1 for leaving the object in sys.modules alone after we call exec_module. > You've abstracted out what method is called, which should mean I don't have > to care which method is used (in case both are defined by a very compatible > loader). But in this case I do are as one will cache its results and the > other won't. Since sys.modules is such a core data structure you should > always know when it has been tampered with, but these semantics make it > unclear based on the the arguments used as to what the result will be in > regards to implicit side-effects. > > I'm going to argue that you should have unified side-effects in regards to > sys.modules so exec_module() should set sys.modules if it was not done so by > the loader. If you want you can add a keyword-only argument to make that > "optional" (i.e. explicitly remove from sys.modules or just don't explicit > set it but whatever the loader did it did) or vice-versa, but there should > be some way for me to be able to say "I don't want any surprises in regards > to sys.modules, so be consistent regardless of loader method called". No, the new methods are *supposed* to be stateless with no global side effects, leaving all import state manipulation to the import system. We just can't *promise* no side effects for exec_module, since the module code might itself mess with the process global state. The fact load_module is *expected* to mess with sys.modules is a design flaw that shouldn't be perpetuated. What I would really like is to be able to call "target_loader.exec_module(existing_main_module)" from runpy, *without* calling create_module first. And now I remember why I wanted to be able to pass an existing module to a loader and say "hey, can you use this as an execution namespace?": so I could pass them __main__, instead of having to hardcode knowledge of different kinds of loader into runpy. So perhaps a better name might be "prepare_module" (by analogy to PEP 3115), and have it accept a "reloading" parameter, which is an existing module to be reused. The signature would be something like: def prepare_module(reloading=None): """Create a module object for execution. Returning None will created a default module. If *reloading* is set, specifies an existing sys.modules entry that is being reloaded. Must return None or that specific object if reloading is supported. Returning a different module object or explicitly raising ImportError indicates that reloading is not supported. (Or perhaps define a "ReloadError" subclass of ImportError?) """ >>> Does this need to be separate from load()? If you merge it in then the >>> sys.modules semantics are unified within load(). Otherwise you need to make >>> this set sys.modules in either case and return from sys.modules. >> >> >> Both load() and reload() call exec(). In the case of load(), it wraps the >> exec() call with the requisite sys.modules handling. In the case of >> reload(), the module should already be in sys.modules. Regardless, >> Loader.load_module() is already required to do all the sys.modules handling >> so that base should be covered. If we deprecate that requirement, which we >> could, then we have a different story. If someone calls exec() directly >> before a module is ever loaded then the sys.modules handling shouldn't >> matter anyway; > > > I disagree: it should and will matter as side-effects of setting sys.modules > influence import. load_module should remain responsible for manipulating import state directly, the new methods should explicitly discourage it (but we can't rule it out for exec_module due to the invocation of arbitrary user code). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Fri Aug 30 16:57:39 2013 From: brett at python.org (Brett Cannon) Date: Fri, 30 Aug 2013 10:57:39 -0400 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Fri, Aug 30, 2013 at 10:41 AM, Nick Coghlan wrote: > On 29 August 2013 23:16, Brett Cannon wrote: > >> Regardless, that I left this as a comment reflects my uncertainty of its > >> utility. > > > > > > There is lowering the barrier of entry and then there is adding a > needless > > API. While I appreciate wanting to make the import system more > accessible, > > you can't paint over sys.modules entirely, so trying to partially hide it > > here won't do anyone any good when they have to deal with it in other > places > > (at least if they are working at the API level of needing to care if > > something is loaded). > > Agreed on this one. > > >>>> In the case where loaders already expose methods for creating > >>>> and preparing modules, a finder may use ``ModuleSpec.from_module()`` > on > >>>> a throw-away module to create the appropriate spec. > >>> > >>> > >>> Why is the module a throw-away one? And why would loaders need to > >>> construct a ModuleSpec? > >> > >> > >> This is something the Nick pointed out. Some loaders may already have > the > >> API to create a module and populate its attributes. > > > > > > The key word here is "may". You simply cannot guess at needs of users > > without explicit evidence this is actually actively true in the wild. > Even > > using importlib as an example is iffy since that's mostly my opinion of > how > > to do things and won't necessarily hold (e.g. the namespace classes Eric > > wrote don't check their values at all while all the classes I wrote > verify > > their name). > > > > Trust me, you don't want to end up supporting an API that only one person > > uses. Keep this initial API small and to the point and expand it as > > necessary or as requests come in for future releases. While we should > aim to > > get the core concepts right the first time, we can expand the API later > as > > necessary. There will be more Python releases after all. =) > > Yeah, I had my sequence of events wrong when I suggested this one (I > was thinking about the create/exec split when loading a module). > Existing finders deal in loaders, not already constructed modules, so > this doesn't actually make sense. > > Let's make it go away :) > > >> As with create_module(), Loader.exec_module() isn't in charge of setting > >> the import-related attributes (as opposed to Loader.load_module(), which > >> is). However, if the module was replaced in sys.modules during > >> exec_module(), then we assume that the one in sys.modules has not been > >> touched by the spec. So we set any of the import-related attributes on > it > >> that aren't already set (respecting the ones that are), with the > exception > >> of __spec__, which we always override. > >> > >> Thinking about it, it may make sense in that case to create a new spec > >> based on the current one but deferring to any of the existing > >> import-related attributes of the module. Then that spec can be set to > >> __spec__. > > > > > > If you do that then you are starting to shift to either a loader method > or a > > function somewhere in importlib, else you are creating a conditional > factory > > for specs. > > I still like the idea of keeping __spec__ as "this is how we found it > originally", with only the module attributes reflecting any dynamic > updates. > > >>> Since exec_module() is going to need to set these anyway for proper > >>> exec() use during loading then why are you setting them *again* later > on? > >>> Should you set these first and then let the methods reset them as they > see > >>> fit? I thought exec_module() took in a filled-in module anyway, so > didn't > >>> you have to set all the attributes prior to passing it in anyway in > step > >>> 1.a? In that case this is a reset which seems wrong if code explicitly > chose > >>> to change the values. > >> > >> > >> It's not that code chose to change the values. It's that code chose to > >> stick some other object in sys.modules and we're going to return it and > we > >> want to be sure the spec and import-related attributes are all properly > set. > > > > > > Do we though? We do that currently except for __package__ and __loader__, > > but that's for backwards-compatibility since those attributes are so new. > > This also destroys the possibility of lazy loading, btw (which I > > unfortunately realized after 3.3 was released, so I kind of regret it). > > > > Since the attributes have to be set before any exec() call I understand > > making sure they are set at that point, but if you're going to shove some > > other object in sys.modules then I view it as your problem to make sure > you > > set those attributes as you are already deviating from normal import > > semantics. IOW if you are given a module you can and should expect it is > as > > prepared as the import system can make it to be executed on, but if you > > choose to ignore it then you are on your own. > > +1 for leaving the object in sys.modules alone after we call exec_module. > > > You've abstracted out what method is called, which should mean I don't > have > > to care which method is used (in case both are defined by a very > compatible > > loader). But in this case I do are as one will cache its results and the > > other won't. Since sys.modules is such a core data structure you should > > always know when it has been tampered with, but these semantics make it > > unclear based on the the arguments used as to what the result will be in > > regards to implicit side-effects. > > > > I'm going to argue that you should have unified side-effects in regards > to > > sys.modules so exec_module() should set sys.modules if it was not done > so by > > the loader. If you want you can add a keyword-only argument to make that > > "optional" (i.e. explicitly remove from sys.modules or just don't > explicit > > set it but whatever the loader did it did) or vice-versa, but there > should > > be some way for me to be able to say "I don't want any surprises in > regards > > to sys.modules, so be consistent regardless of loader method called". > > No, the new methods are *supposed* to be stateless with no global side > effects, leaving all import state manipulation to the import system. > We just can't *promise* no side effects for exec_module, since the > module code might itself mess with the process global state. The fact > load_module is *expected* to mess with sys.modules is a design flaw > that shouldn't be perpetuated. > OK, then that needs to be clearly specified in the PEP that the fact that occurs is a design flaw that will slowly go away as time goes on and people move to the new API. > > What I would really like is to be able to call > "target_loader.exec_module(existing_main_module)" from runpy, > *without* calling create_module first. > > And now I remember why I wanted to be able to pass an existing module > to a loader and say "hey, can you use this as an execution > namespace?": so I could pass them __main__, instead of having to > hardcode knowledge of different kinds of loader into runpy. > > So perhaps a better name might be "prepare_module" (by analogy to PEP > 3115), and have it accept a "reloading" parameter, which is an > existing module to be reused. > Is this to replace create_module() or exec_module()? > > The signature would be something like: > > def prepare_module(reloading=None): > """Create a module object for execution. Returning None will > created a default module. > I can't follow that sentence. =) What does returning None represent? > > If *reloading* is set, specifies an existing sys.modules entry > that is being reloaded. As in the key into sys.modules? > Must return None or that > specific object if reloading is supported. What's "that" supposed to represent? > Returning a > different module object or explicitly raising ImportError > indicates that reloading is not supported. (Or perhaps define > a "ReloadError" subclass of ImportError?) > """ > I'm really not following what this method is supposed to do. Is it simply mucking with sys.modules? Is it creating a module to use? If it's the latter then how does return None do anything? Are you saying returning None means "I didn't do anything special, do what you want"? -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Aug 30 17:12:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 31 Aug 2013 01:12:26 +1000 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On 31 August 2013 00:57, Brett Cannon wrote: >> So perhaps a better name might be "prepare_module" (by analogy to PEP >> 3115), and have it accept a "reloading" parameter, which is an >> existing module to be reused. > > > Is this to replace create_module() or exec_module()? It replaces create_module. >> The signature would be something like: >> >> def prepare_module(reloading=None): >> """Create a module object for execution. Returning None will >> created a default module. Oops, stuffed up the signature. First arg should be the module spec: def prepare_module(spec, reloading=None): ... > I can't follow that sentence. =) What does returning None represent? Returning None indicates that the *loader* defines a module creation API, but the particular module being loaded doesn't take advantage of it. It's a feature I need for the new extension module loader API, where the creation hook allows the extension module to build a completely custom object (perhaps with additional state). You can request an ordinary module just by not defining the creation hook, and only defining the execution hook (which accepts an already created module). By switching to a *preparation* hook, rather than creation, I think we can make this play more nicely with reloading. In the reloading case, the preparation hook would be responsible for checking that the existing object was a suitable execution target. >> If *reloading* is set, specifies an existing sys.modules entry >> that is being reloaded. > > As in the key into sys.modules? No, as in the object itself. Technically it doesn't *have* to be in sys.modules, and the loader really shouldn't care if it is or not. >> Must return None or that >> specific object if reloading is supported. > > > What's "that" supposed to represent? s/that specific object/the passed in object/ >> Returning a >> different module object or explicitly raising ImportError >> indicates that reloading is not supported. (Or perhaps define >> a "ReloadError" subclass of ImportError?) >> """ > > > I'm really not following what this method is supposed to do. Is it simply > mucking with sys.modules? Is it creating a module to use? If it's the latter > then how does return None do anything? Are you saying returning None means > "I didn't do anything special, do what you want"? It replace create_module with something that can also serve as the pre-check for the reloading case. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Fri Aug 30 17:31:17 2013 From: brett at python.org (Brett Cannon) Date: Fri, 30 Aug 2013 11:31:17 -0400 Subject: [Import-SIG] PEP 451 (ModuleSpec) round 3 In-Reply-To: References: Message-ID: On Fri, Aug 30, 2013 at 11:12 AM, Nick Coghlan wrote: > On 31 August 2013 00:57, Brett Cannon wrote: > >> So perhaps a better name might be "prepare_module" (by analogy to PEP > >> 3115), and have it accept a "reloading" parameter, which is an > >> existing module to be reused. > > > > > > Is this to replace create_module() or exec_module()? > > It replaces create_module. > > >> The signature would be something like: > >> > >> def prepare_module(reloading=None): > >> """Create a module object for execution. Returning None will > >> created a default module. > > Oops, stuffed up the signature. First arg should be the module spec: > > def prepare_module(spec, reloading=None): > ... > > > I can't follow that sentence. =) What does returning None represent? > > Returning None indicates that the *loader* defines a module creation > API, but the particular module being loaded doesn't take advantage of > it. > IOW returning None means "I don't have anything special to say here, so do what you want"? > > It's a feature I need for the new extension module loader API, where > the creation hook allows the extension module to build a completely > custom object (perhaps with additional state). You can request an > ordinary module just by not defining the creation hook, and only > defining the execution hook (which accepts an already created module). > OK, so this is purely for special-cases and not meant to always return something, just return something when needed. > > By switching to a *preparation* hook, rather than creation, I think we > can make this play more nicely with reloading. In the reloading case, > the preparation hook would be responsible for checking that the > existing object was a suitable execution target. > Ah, OK. It's more of a pre-condition check in that case, otherwise it's a chance to say "use this rather than whatever you default to". > > >> If *reloading* is set, specifies an existing sys.modules entry > >> that is being reloaded. > > > > As in the key into sys.modules? > > No, as in the object itself. Technically it doesn't *have* to be in > sys.modules, and the loader really shouldn't care if it is or not. > That's what I figured. > > >> Must return None or that > >> specific object if reloading is supported. > > > > > > What's "that" supposed to represent? > > s/that specific object/the passed in object/ > > >> Returning a > >> different module object or explicitly raising ImportError > >> indicates that reloading is not supported. (Or perhaps define > >> a "ReloadError" subclass of ImportError?) > >> """ > > > > > > I'm really not following what this method is supposed to do. Is it simply > > mucking with sys.modules? Is it creating a module to use? If it's the > latter > > then how does return None do anything? Are you saying returning None > means > > "I didn't do anything special, do what you want"? > > It replace create_module with something that can also serve as the > pre-check for the reloading case. In ModuleSpec.load(): module = self.loader.prepare_module(self) if module is None: module = types.ModuleType(self.name) And in reload(): module = self.loader.prepare_module(self, module_being_reloaded) That way some custom object can be used, and in the reload case ImportError can just propagate up if it turns out the module can't be reloaded. -------------- next part -------------- An HTML attachment was scrubbed... URL: