[Import-SIG] Loading Resources From a Python Module/Package

Sat Jan 31 17:43:52 CET 2015

> On Jan 31, 2015, at 11:31 AM, Brett Cannon <brett at python.org> wrote:
> 
> 
> 
> On Sat Jan 31 2015 at 10:54:22 AM Paul Moore <p.f.moore at gmail.com <mailto:p.f.moore at gmail.com>> wrote:
> On 31 January 2015 at 15:47, Donald Stufft <donald at stufft.io <mailto:donald at stufft.io>> wrote:
> >> It's certainly possible to add a new API that loads resources based on
> >> a relative name, but you'd have to specify relative to *what*.
> >> get_data explicitly ducks out of making that decision.
> >
> > data = __loader__.get_bytes(__name__, “logo.gif”)
> 
> Quite possibly. It needs a bit of fleshing out to make sure it doesn't
> prohibit sharing of loaders, etc, in the way Brett mentions.
> 
> By specifying the package anchor point I don't think it does.
>  
> Also, the
> fact that it needs __name__ in there feels wrong - a bit like the old
> version of super() needing to be told which class it was being called
> from.
> 
> You can't avoid that. This is the entire reason why loader reuse is a pain; you **have** to specify what to work off of, else its ambiguous and a specific feature of a specific loader.
> 
> But this is only an issue when you are trying to access a file relative to the package/module you're in. Otherwise you're going to be specifying a string constant like 'foo.bar'.
>  
> But in principle I don't object to finding a suitable form of
> this.
> 
> And I like the name get_bytes - much more explicit in these Python 3
> days of explicit str/bytes distinctions :-)
> 
> One unfortunate side-effect from having a new method to return bytes from a data file is that it makes get_data() somewhat redundant. If we make it get_data_filename(package_name, path) then it can return an absolute path which can then be passed to get_data() to read the actual bytes. If we create importlib.resources as Donald has suggested then all of this can be hidden  behind a function and users don't have to care about any of this, e.g. importlib.resources.read_data(module_anchor, path).

I think we actually have to go the other way, because only some Loaders will be able to actually return a filename (returning a filename is basically an optimization to prevent needing to call get_data and write that out to a temporary directory) but pretty much any loader should theoretically be able to support get_data.

I think it is redundant but given that it’s a new API (passing module and a “resource path”) I think it makes sense. The old get_data API can be deprecated but left in for compatibility reasons if we want (sort of like Loader().load_module() -> Loader().exec_module()).

> 
> One thing to consider is do we want to allow anything other than filenames for the path part? Thanks to namespace packages every directory is essentially a package, so we could say that the package anchor has to encapsulate the directory and the path bit can only be a filename. That gets us even farther away from having the concept of file paths being manipulated in relation to import-related APIs.

I think we do want to allow directories, it’s not unusual to have something like:

warehouse
├── __init__.py
├── templates
│   ├── accounts
│   │   └── profile.html
│   └── hello.html
├── utils
│   └── mapper.py
└── wsgi.py

Conceptually templates isn’t a package (even though with namespace packages it kinda is) and I’d want to load profile.html by doing something like:

importlib.resources.get_bytes(“warehouse”, “templates/accounts/profile.html”)

In pkg_resources the second argument to that function is a “resource path” which is defined as a relative to the given module/package and it must use / to denote them. It explicitly says it’s not a file system path but a resource path. It may translate to a file system path (as is the case with the FileLoader) but it also may not (as is the case with a theoretical S3Loader or PostgreSQLLoader). How you turn a warehouse + a resource path into some data (or whatever other function we support) is an implementation detail of the Loader.

> 
> And just so I don't forget it, I keep wanting to pass an actual module in so the code can extract the name that way, but that prevents the __name__ trick as you would have to import yourself or grab the module from sys.modules.

Is an actual module what gets passed into Loader().exec_module()? If so I think it’s fine to pass that into the new Loader() functions and a new top level API in importlib.resources can do the things needed to turn a string into a module object. So instead of doing __loader__.get_bytes(__name__, “logo.gif”) you’d do importlib.resources.get_bytes(__name__, “logo.gif”).

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150131/7e7d5034/attachment-0001.html>