[Import-SIG] Loading Resources From a Python Module/Package

Barry Warsaw barry at python.org
Sat Jan 31 18:40:04 CET 2015


On Jan 30, 2015, at 07:52 PM, Donald Stufft wrote:

>resource_exists(package_or_requirement, resource_name)
>    Does the named resource exist? Return True or False accordingly.

+1

>resource_stream(package_or_requirement, resource_name)
>    Return a readable file-like object for the specified resource; it may be
>    an actual file, a StringIO, or some similar object. The stream is in
>    “binary mode”, in the sense that whatever bytes are in the resource will
>    be read as-is.

See my previous follow up.  I'd much rather have an open()-like API so I don't
have to do the subsequent decoding.

>resource_string(package_or_requirement, resource_name)
>    Return the specified resource as a string. The resource is read in binary
>    fashion, such that the returned string contains exactly the bytes that are
>    stored in the resource.

Right, so resource_string() is the wrong name <wink>.  In my Python 3 code I
always do:

from pkg_resources import resource_string as resource_bytes

so at least the call sites more accurately reflect reality. :)

>resource_isdir(package_or_requirement, resource_name)
>    Is the named resource a directory? Return True or False accordingly.
>
>resource_listdir(package_or_requirement, resource_name)
>    List the contents of the named resource directory, just like os.listdir
>    except that it works even if the resource is in a zipfile.

I've used these, but rarely, so I don't care too much.

>resource_filename(package_or_requirement, resource_name)
[...]
>Obviously the similar functions here are:
>
>* pkgutil.get_data is pkg_resources.resource_string
>* pkgutil.get_data_filename is pkg_resources.resource_filename
>
>The major difference being that pkg_resource.resource_filename will extract
>to a cache directory (controllable with an environment variable or
>programatically) and won't clean up the extracted files. This means that they
>are (by default) extracted once per user and reused between extractions. I
>felt like it made more sense to just extract to a temporary location (even
>though this is less performant) in the stdlib.

Extracting to a temporary location is fine.  These generally aren't
performance critical sections (e.g. I use them predominately in tests) and if
they are then I'd rather let the user define the caching policy.

>That leaves:
>
>* resource_exists
>* resource_stream
>* resource_isdir
>* resource_listdir
>
>Which can be done via pkg_resources but not via the standard library, I don't
>have a major opinion on whether or not the standard library should do all of
>them but I don't think it would hurt if it did.

resource_stream() is useful, but see my previous response on that.

>Another interesting question if we're going to add more methods is where they
>should all live. As far as I know pkgutil.get_data predates the importlib
>module. Perhaps deprecating pkgutil.get_data and adding a importlib.resources
>module which supports functions like:
>
>* get_bytes(package, resource)
>* get_stream(package, resource)
>* get_filename(package, resource)
>* exists(package, resource)
>* isdir(package, resource)
>* listdir(package, resource)

Modulo bikeshedding on the names of the functions, importlib.resources seems
like a nice place for it.

>Changing the names (particular get_data -> get_bytes) could also provide the
>mechanism for allowing relative files and deprecating the "you must pass in
>a full file path to the Loader()" behavior since the get_data method could be
>left alone and a new get_bytes method could be added.

+1, but see also my previous suggestion about path restrictions.

Cheers,
-Barry


More information about the Import-SIG mailing list