From ericsnowcurrently at gmail.com Tue Apr 1 00:26:32 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 31 Mar 2014 16:26:32 -0600 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On Sat, Feb 1, 2014 at 11:44 AM, Brett Cannon wrote: > Over on distutils-sig it came up that getting people to not simply assume > that __file__ points to an actual file and thus avoid using open() directly > to read intra-package files is an issue. In order to make using a loader's > get_data reasonable (let alone set_data), there needs to be a clear > specification of how things are expected to work and make sure that > everything that people need is available. > > The docs for importlib.ResourceLoader.get_data > (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data) > say that things are expected to be based off of __file__, and with Python > 3.4 using only absolute paths (except for __main__) that means all paths > would be absolute by default. As long as people stick to pathlib/os.path and > don't use non-standard path separators then this should just work. > > But what if people don't do that? I honestly say that it should either be > explicitly undefined or that it's an IOError. IOW either we say "use > absolute paths or else you're on your own" or "use absolute paths, period". > That prevents having to make a decision as to whether a relative path is > relative to the module the loader is attached to or relative to the package > (e.g. easier for pre-module loaders or per-package loaders, respectively). > The former is more backwards-compatible so I say the docs get updated to say > that relative paths are undefined behaviour. It should definitely be up to the loader associated with the module. Some loaders, including relevant ones in importlib.machinery, are unique to individual modules and store __file__. In that case I'd expect a relative path to mean relative to __file__. Some loader could also track __path__ as well and a relative path would be relative to the path entries there. However, the general API for get_/set_data() cannot rely on such loader state without that state being part of the relevant ABCs. Otherwise the path passed to get_/set_data() would have to be absolute. Furthermore, for loaders that handle non-file locations, "path" may not be a filesystem path at all, as PJE pointed out, so a general requirement regarding absolute/relative paths wouldn't work. __file__ is an unfortunate name in those cases, and PEP 451 resolved this for specs by calling it "origin" (along with has_location and submodule_search_locations). It may be worth adding a resolve_location() method to loaders, to address any ambiguity. > > The second issue is whether get_data/set_data are enough or if something > else is needed, e.g. a listdir-like method. Since this is meant for handling > intra-package data my assumption is that it isn't really necessary as > chances are you know what files you included in your distribution (or at > least what the possible names are). Sounds useful to me as long as the API wasn't strictly focused on file-based resources. It sounds like there may be other resource-related methods worth adding at the same time. > I know some have asked for a > listdir-like API to help discover what modules are available so as to > provide a plugin API, but I view that as a separate thing and potentially > more appropriate on finders. Remember, the smaller the API service for the > common case the better for the stdlib. I think this would be nice, but only worth it if we anticipate a good use case. -eric From ericsnowcurrently at gmail.com Tue Apr 1 00:28:28 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 31 Mar 2014 16:28:28 -0600 Subject: [Import-SIG] A ModuleData API (was: Re: making it feasible to rely on loaders for reading intra-package data files) Message-ID: On Sat, Feb 1, 2014 at 11:44 AM, Brett Cannon wrote: > Over on distutils-sig it came up that getting people to not simply assume > that __file__ points to an actual file and thus avoid using open() directly > to read intra-package files is an issue. In order to make using a loader's > get_data reasonable (let alone set_data), there needs to be a clear > specification of how things are expected to work and make sure that > everything that people need is available. > > The docs for importlib.ResourceLoader.get_data > (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data) > say that things are expected to be based off of __file__, and with Python > 3.4 using only absolute paths (except for __main__) that means all paths > would be absolute by default. As long as people stick to pathlib/os.path and > don't use non-standard path separators then this should just work. > > But what if people don't do that? I honestly say that it should either be > explicitly undefined or that it's an IOError. IOW either we say "use > absolute paths or else you're on your own" or "use absolute paths, period". > That prevents having to make a decision as to whether a relative path is > relative to the module the loader is attached to or relative to the package > (e.g. easier for pre-module loaders or per-package loaders, respectively). > The former is more backwards-compatible so I say the docs get updated to say > that relative paths are undefined behaviour. > > The second issue is whether get_data/set_data are enough or if something > else is needed, e.g. a listdir-like method. Since this is meant for handling > intra-package data my assumption is that it isn't really necessary as > chances are you know what files you included in your distribution (or at > least what the possible names are). I know some have asked for a > listdir-like API to help discover what modules are available so as to > provide a plugin API, but I view that as a separate thing and potentially > more appropriate on finders. Remember, the smaller the API service for the > common case the better for the stdlib. Here's a rough idea that helps consolidate behavior and move the focus of the data APIs away from loaders and toward modules. In my mind loader methods are low-level and meant particularly for consumption by the import system. It would be nice to have higher-level APIs for everyone else to use. It would make sense to wrap that up in a class. >From what I can tell, use cases for the data-related load API are module-centric, so it would make sense to have the high-level API focus on modules, rather than loaders: class ModuleData: def __init__(self, module): self.module = module self.loader = module.__loader__ def get_data(self, location): return self.loader.get_data(location) def set_data(self, location, data): return self.loader.set_data(location, data) ... This gives us the ability to generalize standard data-related behavior across all loaders (kind of like PEP 451 did for loading). It would also make customization simpler. File-based modules/loaders are the common case. It would be nice to provide default implementations thereby. I see two approaches: * Subclass ModuleData (e.g. FileModuleData). * Add a boolean "filebased" attr to loaders that ModuleData could use to trigger customized behavior. In either case, it would make sense to add a method (e.g. get_data_api(module)) that returns a ModuleData instance, thus allowing each loader to pick the type returned. ModuleData.__init__ implies having the module (already imported). The current loader API does not require any module, so that low-level API would still be useful if someone wanted to avoid loading the module first. Alternately there could be a mechanism for building a ModuleData object from a loader without needing to load the module first. (I had brought up something similar with PEP 451, but it was too out of scope to pursue there.) -eric From bcannon at gmail.com Fri Apr 4 20:57:52 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 4 Apr 2014 14:57:52 -0400 Subject: [Import-SIG] How best to replace imp.load_module()? Message-ID: I've been thinking about what it takes to replace imp and I realized that imp.load_module() is the hardest to replace for two reasons. One issue is that importlib.abc.create_module() can -- and does -- return None. This means that if someone wanted to replicate the imp.find_module()/imp.load_module() dance in a PEP 451 world it takes:: spec = importlib.find_spec(name) try: module = spec.loader.create_module(spec) except AttributeError: module = None if module is None: module = types.ModuleType(spec.name) # No clear way to set import-related attributes. spec.loader.exec_module(module) It took 6 lines to get a module. That seems a bit excessive and ripe to either have an importlib.util function that handles this all correctly or simply make create_module() a required method on a loader and have importlib.abc.create_module() return types.ModuleType(spec.name). The second annoyance is that we have not exposed _SpecMethod.init_module_attrs() in any way. That can either come through in importlib.util or we can make types.ModuleType grow a method that takes a spec and then sets all the appropriate attributes. If we go with the former we should make sure that in importlib we always prefer __spec__ over any module-level values and that one can pass in a spec to types.ModuleType to set __spec__ so that in the distant Python 4 future we can deprecate all module-level attributes and just work off of __spec__. I think if we can get these two bits cleaned up we can tell people who use imp.load_module() directly that they can:: # Assume proper loader chosen. spec = importlib.util.spec_from_loader(loader) module = some_newfangled_way_of_doing_this() somehow_init_module_attrs(module) loader.exec_module(module) which isn't that bad for something they probably should be avoiding as much as possible to begin with. Or we take the easiest option and simply ignore all of these issues and just say that working outside of import is not something we want to worry about in the stdlib and let PyPI come up with some utility code that does all of this for you if you really want it. The code maintainer in me is liking this idea + making it easier to set __spec__ on a module through its constructor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcannon at gmail.com Fri Apr 4 21:07:02 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 4 Apr 2014 15:07:02 -0400 Subject: [Import-SIG] How best to replace imp.load_module()? In-Reply-To: References: Message-ID: After writing this, I realized the reason imp existed was to work around the limitation of import being in C. Now that import is in Python and very well exposed, there's really nothing to say that we need importlib to grow to some fat package that tries to solve all the issues that imp did. If we keep importlib lean in terms of only providing things that make import work, make providing custom importers easy, or things that are truly tough to do right, then stuff like imp.load_module() is really outside of its purview as that can be done without a terrible amount of effort by a project on PyPI (beyond what I put in below all they would need to do is copy _SpecMethods.init_module_attrs()). I'm really starting to like the idea of not trying to contort ourselves to replacing imp.find_module/load_module directly. I would still like to fix types.ModuleType to do the right thing for __spec__ in its constructor and make sure importlib does as well, but otherwise I'm happy with relying more on the community to pick up some of the higher-level API stuff for us. On Fri, Apr 4, 2014 at 2:57 PM, Brett Cannon wrote: > I've been thinking about what it takes to replace imp and I realized that > imp.load_module() is the hardest to replace for two reasons. One issue is > that importlib.abc.create_module() can -- and does -- return None. This > means that if someone wanted to replicate the > imp.find_module()/imp.load_module() dance in a PEP 451 world it takes:: > > spec = importlib.find_spec(name) > try: > module = spec.loader.create_module(spec) > except AttributeError: > module = None > if module is None: > module = types.ModuleType(spec.name) > # No clear way to set import-related attributes. > spec.loader.exec_module(module) > > It took 6 lines to get a module. That seems a bit excessive and ripe to > either have an importlib.util function that handles this all correctly or > simply make create_module() a required method on a loader and have > importlib.abc.create_module() return types.ModuleType(spec.name). > > The second annoyance is that we have not exposed > _SpecMethod.init_module_attrs() in any way. That can either come through in > importlib.util or we can make types.ModuleType grow a method that takes a > spec and then sets all the appropriate attributes. If we go with the former > we should make sure that in importlib we always prefer __spec__ over any > module-level values and that one can pass in a spec to types.ModuleType to > set __spec__ so that in the distant Python 4 future we can deprecate all > module-level attributes and just work off of __spec__. > > I think if we can get these two bits cleaned up we can tell people who use > imp.load_module() directly that they can:: > > # Assume proper loader chosen. > spec = importlib.util.spec_from_loader(loader) > module = some_newfangled_way_of_doing_this() > somehow_init_module_attrs(module) > loader.exec_module(module) > > which isn't that bad for something they probably should be avoiding as > much as possible to begin with. > > Or we take the easiest option and simply ignore all of these issues and > just say that working outside of import is not something we want to worry > about in the stdlib and let PyPI come up with some utility code that does > all of this for you if you really want it. The code maintainer in me is > liking this idea + making it easier to set __spec__ on a module through its > constructor. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Apr 5 02:07:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Apr 2014 10:07:50 +1000 Subject: [Import-SIG] How best to replace imp.load_module()? In-Reply-To: References: Message-ID: Keep in mind I already need a fair bit of this kind of thing to make runpy work properly. Moving that infrastructure to importlib.util is worthwhile, because it makes it easier to evolve the core import implementation with confidence that we're not breaking even obscure use cases. The migration of extension modules to PEP 451 should take place on the road to 3.5, and we should take a close look at migrating pdb and friends to runpy with a view to adding -m support (which may require new features in runpy itself). Moving zipimport to a frozen Python module may also be desirable. I think that's a better use case driven path to follow, and we can hold off on finalising the imp.load_module deprecation for the time being. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcannon at gmail.com Sat Apr 5 15:58:23 2014 From: bcannon at gmail.com (Brett Cannon) Date: Sat, 5 Apr 2014 09:58:23 -0400 Subject: [Import-SIG] How best to replace imp.load_module()? In-Reply-To: References: Message-ID: On Fri, Apr 4, 2014 at 8:07 PM, Nick Coghlan wrote: > Keep in mind I already need a fair bit of this kind of thing to make runpy > work properly. > OK, that's good to know. > Moving that infrastructure to importlib.util is worthwhile, because it > makes it easier to evolve the core import implementation with confidence > that we're not breaking even obscure use cases. > > The migration of extension modules to PEP 451 should take place on the > road to 3.5, and we should take a close look at migrating pdb and friends > to runpy with a view to adding -m support (which may require new features > in runpy itself). > SGTM > Moving zipimport to a frozen Python module may also be desirable. > I think the amount of dependencies might make this more of a pain than it's worth. I asked Greg and Thomas if they thought it might be worth it after all the headaches they went through for zipimport and they didn't think it necessarily worth it. While I would be quite happy if someone actually tried to figure out the feasibility (maybe running zipfile through modulefinder is enough to get an idea?), I just don't know if the level of dependency will be so high that it will just get annoying short of freezing the entire stdlib (which in and of itself might be an interesting exercise, although I would see some flipping out over the increased binary size). > I think that's a better use case driven path to follow, and we can hold > off on finalising the imp.load_module deprecation for the time being. > Well, it's been explicitly deprecated since Python 3.3 (3.3 had a DeprecationWarning in the function, 3.4 has it implicitly through the module-level deprecation). But actual removal won't happen until we do a deprecation spring cleaning in the stdlib (e.g. Python 4 kind of thing). Anyway, I'll wait until you're ready to work on runpy stuff to worry about what exactly we want to support so as to not go stabbing in the dark as trying to get average use cases has been hard to come by (GitHub actually suggests very few people use load_module() w/o find_module() which makes this easier to deal with). -Brett > Cheers, > Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Apr 5 23:45:02 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 5 Apr 2014 15:45:02 -0600 Subject: [Import-SIG] How best to replace imp.load_module()? In-Reply-To: References: Message-ID: On Apr 4, 2014 12:58 PM, "Brett Cannon" wrote: > > I've been thinking about what it takes to replace imp and I realized that imp.load_module() is the hardest to replace for two reasons. One issue is that importlib.abc.create_module() can -- and does -- return None. This means that if someone wanted to replicate the imp.find_module()/imp.load_module() dance in a PEP 451 world it takes:: > > spec = importlib.find_spec(name) > try: > module = spec.loader.create_module(spec) > except AttributeError: > module = None > if module is None: > module = types.ModuleType(spec.name) > # No clear way to set import-related attributes. > spec.loader.exec_module(module) > > It took 6 lines to get a module. That seems a bit excessive and ripe to either have an importlib.util function that handles this all correctly or simply make create_module() a required method on a loader and have importlib.abc.create_module() return types.ModuleType(spec.name). I agree with your later email about not needing to add a ton of API unnecessarily. I'm not sure imp.load_module() needs to live on in importlib. The key one I'd like to see is a replacement for direct calls to loader.load_module(). We do this a bunch in the stdlib and each of those places currently has to do a dance around _SpecMethods. Doing so outside the stdlib isn't really correct. In both cases it would be nice to wrap that in a util function (like "import_from_loader()") or a classmethod on Loader. > > The second annoyance is that we have not exposed _SpecMethod.init_module_attrs() in any way. If we had an import_from_loader(), I'm not sure we'd need to worry about it. Would there be other use cases for setting those attrs? Also, once I've wrapped up (either way) the OrderedDict-related stuff I'm working on, my main goal is to propose a successor to PEP 406 (ImportEngine). That would including exposing most of the _SpecMethods API in some form. FWIW, I'm still uncomfortable with exposing that API directly on ModuleSpec, but would like to see it exposed publicly in some indirect way. > That can either come through in importlib.util or we can make types.ModuleType grow a method that takes a spec and then sets all the appropriate attributes. Maybe if it were a class-only method. I'd hate to see something like that exposed on module objects. > If we go with the former we should make sure that in importlib we always prefer __spec__ over any module-level values and that one can pass in a spec to types.ModuleType to set __spec__ so that in the distant Python 4 future we can deprecate all module-level attributes and just work off of __spec__. This is tricky. It depends on how useful it is to people to have module attrs that vary from the spec (for which the module was originally loaded). That aside, I've found it's still useful to have __name__ and __file__ rather than having to look them up on __spec__. Maybe that's just because I'm not used to it. :) That would be a different matter if the common use cases for the two were satisfied by other means. It would be nice if we could restrict __file__ to just modules that are FS-based (and drop it or set it to None for all other modules). The others attrs are probably used uncommonly enough that they could get dropped from module objects. > > I think if we can get these two bits cleaned up we can tell people who use imp.load_module() directly that they can:: > > # Assume proper loader chosen. > spec = importlib.util.spec_from_loader(loader) > module = some_newfangled_way_of_doing_this() > somehow_init_module_attrs(module) > loader.exec_module(module) That's basically what import_from_loader() would do. > > which isn't that bad for something they probably should be avoiding as much as possible to begin with. Maybe it would be worth getting a clear idea of why people use imp.load_module() and loader.load_module() (directly). My guess is that it is to accomplish slight deviations from normal import behavior. For example, that last timed I looked at the source, Salt used imp.load_module() to do some trickery. However, I expect that most cases would be satisfied by use of proper importers. Either way, it would be nice to have a better picture of what unusual things people are doing like this. I'd benefit from that at least. :) > > Or we take the easiest option and simply ignore all of these issues and just say that working outside of import is not something we want to worry about in the stdlib and let PyPI come up with some utility code that does all of this for you if you really want it. The code maintainer in me is liking this idea + making it easier to set __spec__ on a module through its constructor. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: