[Import-SIG] Rough PEP: A ModuleSpec Type for the Import System

Fri Aug 9 20:03:32 CEST 2013

On Fri, Aug 9, 2013 at 8:40 AM, Brett Cannon <brett at python.org> wrote:

> On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow <ericsnowcurrently at gmail.com>wrote:
>
>> Finally, when the import system calls a finder's ``find_module()``, the
>>
> finder makes use of a variety of information about the module that is
>> useful outside the context of the method.  Currently the options are
>> limited for persisting that per-module information past the method call,
>> since it only returns the loader.  Either store it in a module-to-info
>> mapping somewhere like on the finder itself, or store it on the loader.
>>
>
> The two previous sentences are hard to read; I think you were after
> something like,
> "Popular options for this limitation are to store the information is in a
> module-to-info
> mapping somewhere on the finder itself, or store it on the loader.
>

Sounds good.

>
>
>> (The idea grew feet during discussions related to another PEP.[1])
>>
>
> "(This PEP grew out of discussions related to another PEP [1])"
>

Yeah, this was one of the last things I added to the PEP and my brain was
starting to get a little fuzzy. :)

> * ``is_package`` - whether or not the module is a package.
>>
>
> I think is_package() is redundant in the face of 'name'/'package' or
> 'path' as you can introspect the same information. I honestly have always
> found it a weakness of InspectLoader.is_package() that it didn't return the
> value for __path__.
>

I see what you mean, but I also think it's nice to be able to explicitly
see if a spec is for a package without having to know about underlying
rules.  However, I'll just make it a property instead of something set on
the spec (and remove it from __init__).

>
>
>> * ``origin`` - the location from which the module originates.
>>
>
> Don't quite follow what this is meant to represent? Like the path to the
> zipfile if loaded that way, otherwise it's the file path?
>

Yeah, Antoine had the same question.  I'll make sure the PEP is clearer.
 Basically filename maps to the module's __file__ and origin is used for
the module's repr if filename isn't set.

>
>
>> * ``filename`` - like origin, but limited to a path-based location
>>   (compare to ``__file__``).
>> * ``cached`` - the location where the compiled module should be stored
>>   (compare to ``__cached__``).
>> * ``path`` - the list of path entries in which to search for submodules
>>   or ``None``.  (compare to ``__path__``).  It should be in sync with
>>   ``is_package``.
>>
>
> Why is 'path' the only attribute with a default value? Should probably say
> everything has a default value of None if not set/known.
>

Good point.

>
>
>>
>> Those are also the parameters to ``ModuleSpec.__init__()``, in that
>> order.
>>
>
> I would consider arguing all arguments should be keyword-only past 'name'
> since there is no way most people will remember that order correctly.
>

Makes sense, though I'll make everything but name and loader keyword-only.

> * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the
>>   arguments.  The parameters are the same as with ``__init__``, except
>>   ``package`` is excluded and only ``name`` and ``loader`` are required.
>>
>
> Why the switch in requirements compared to __init__()?
>

Because package is always calculated and only name and loader are necessary
to calculate the remaining attributes.  Perhaps from_loader() is the wrong
name (I'm open to alternatives).  Perhaps __init__() should take over some
of the calculating.  My intention is to provide one API for
what-you-pass-in-is-what-you-get (__init__) and another for calculating
attributes.  Of course, one could simply modify the spec after creating it,
but I like idea of explicitly opting in to calculated values.  I'll add
this point to the PEP.  Also I'll probably also drop package as a parameter
of __init__ and make the attribute a property.

I've also toyed with the idea of making all the attributes properties (aka
read-only) since changing a module's spec later on could lead to headache,
but I'm not convinced that is a easy problem to cause.  It's better to not
get in the way of those who have needs I haven't anticipated (consenting
adults, etc.).  What do you think?

>
>
>> * ``module_repr()`` - returns a repr for the module.
>> * ``init_module_attrs(module)`` - sets the module's import-related
>>   attributes.
>>
>
> Specify what those attributes are and how they are set.
>

Will do.

>
>
>> * ``load(module=None, *, is_reload=False)`` - calls the loader's
>>   ``exec_module()``, falling back to ``load_module()`` if necessary.
>>   This method performs the former responsibilities of loaders for
>>   managing modules before actually loading and for cleaning up.  The
>>   reload case is facilitated by the ``module`` and ``is_reload``
>>   parameters.
>>
>
> If a module is provided and there is already a matching key in
> sys.modules, what happens?
>
 What if is_reload is True but there is no module provided or in
> sys.modules; KeyError, ValueError, ImportError? Do you follow having None
> in sys.modules and raise ImportError, or do you overwrite (same question if
> a module is explicitly provided)?
>

That's a good point.  I thought I had addressed this in the PEP, but
apparently not.  For Loader.load_module(), as you know, the existence of
the key in sys.modules indicates a reload should happen.  The is_reload
parameter is meant to provide an explicit indicator.  The module you pass
in is simply the one to use.  If a module is not passed in and is_reload is
true, the module in sys.modules will be used.  If that module is None or
not there, ImportError would be raised.  If a module is passed in and
is_reload is false, I was planning on just ignoring that module.  However
raising ValueError in that case would be more useful, indicating that the
method was called incorrectly.

Having just the module parameter and letting it indicate a reload is
doable, but that would mean losing the option of having load() look up the
module (and it's less explicit).  Another option is to have a separate
reload() method.  Antoine mentioned it and I'd considered it early on.  I'm
considering it again since it makes the API less complicated.  Do you have
a preference between the current proposal (load() does it all) and a
separate reload() method?

 ``is_package`` is derived from ``path``, if passed.  Otherwise the
>> loader's ``is_package()`` is tried.  Finally, it defaults to False.
>>
>
> It can also be calculated based on whether ``name`` == ``package``: ``True
> if path is not None else name == package``.
>

Good point, though at this point I don't think package will be something
you set.

Always need to watch out for [] for path as that is valid and signals the
> module is a package.
>

Yeah, I've got that covered in from_loader().

This is where defining exactly what details need to be passed in and which
> ones are optional are going to be critical in determining what represents
> ambiguity/unknown details vs. what is flat-out known to be true/false.
>

Agreed.  I'll be sure to spell it out.

> ``cached`` is derived from ``filename`` if it's available.
>>
>
> Derived how?
>

cache_from_source()

>  methods would now return a module spec
>> instead of loader, specs must act like the loader that would have been
>> returned instead.  This is relatively simple to solve since the loader
>> is available as an attribute of the spec.
>>
>
> Are you going to define a __getattr__ to delegate to the loader? Or are
> you going to specifically define equivalent methods, e.g. get_filename() is
> obviously solvable by getting the attribute from the spec (as long as
> filename is a required value)?
>

__getattr__().  I don't want to guess what methods a loader might have.
 And if someone wants to call get_filename() on what they think is the
loader, I think it's better to just call the loader's get_filename().  I'd
left this stuff out as an implementation detail.  Do you think it should be
in the PEP?  I could simply elaborate on "specs must act like the loader".

>
>
>>
>> However, ``ModuleSpec.is_package`` (an attribute) conflicts with
>> ``InspectLoader.is_package()`` (a method).  Working around this requires
>> a more complicated solution but is not a large obstacle.
>>
>> Unfortunately, the ability to proxy does not extend to ``id()``
>> comparisons and ``isinstance()`` tests.  In the case of the return value
>> of ``find_module()``, we accept that break in backward compatibility.
>>
>
> Mention that ModuleSpec can be added to the proper ABCs in importlib.abc
> to help alleviate this issue.
>

Good point.

>
>
>>
>> Subclassing
>> -----------
>>
>> .. XXX Allowed but discouraged?
>>
>
> Why should it matter if they are subclassed?
>

My goal was for ModuleSpec to be the container for module definition state
with some common attributes as a baseline and a minimal number of methods
for the import system to use.  Loaders would be where you would do extra
stuff or customize functionality, which is basically what happens now.

It seemed correct before but now it's feeling like a very artificial and
unnecessary objective.

Finders
>> -------
>>
>> Finders will now return ModuleSpec objects when ``find_module()`` is
>> called rather than loaders.  For backward compatility, ``Modulespec``
>> objects proxy the attributes of their ``loader`` attribute.
>>
>> Adding another similar method to avoid backward-compatibility issues
>> is undersireable if avoidable.  The import APIs have suffered enough.
>>
>
> in lieu of the fact that find_loader() was just introduced in Python 3.3.
>

Are you suggesting additional wording or making a comment?

>
>> Loaders
>> -------
>>
>> Loaders will have a new method, ``exec_module(module)``.  Its only job
>> is to "exec" the module and consequently populate the module's
>> namespace.  It is not responsible for creating or preparing the module
>> object, nor for any cleanup afterward.  It has no return value.
>>
>> The ``load_module()`` of loaders will still work and be an active part
>> of the loader API.  It is still useful for cases where the default
>> module creation/prepartion/cleanup is not appropriate for the loader.
>>
>
> But will it still be required? Obviously importlib.abc.Loader can grow a
> default load_module() defined around exec_module(), but it should be clear
> if we expect the method to always be manually defined or if it will
> eventually go away.
>

load_module() will no longer be required.  However, it still serves a real
purpose: the loader may still need to control more of the loading process.
 By implementing load_module() but not exec_module(), a loader gets that.
 I'm make sure that's clear.

>
>
>>
>> A loader must have ``exec_module()`` or ``load_module()`` defined.  If
>> both exist on the loader, ``exec_module()`` is used and
>> ``load_module()`` is ignored.
>>
>
> Ignored by whom? Should specify that the import system is the one doing
> the ignoring.
>

Got it.

> * Deprecations in ``importlib.util``: ``set_package()``,
>>
>  ``set_loader()``, and ``module_for_loader()``.  ``module_to_load()``
>> (introduced in 3.4) can be removed.
>>
>
> "(introduced prior to Python 3.4's release)"; remember, PEPs are timeless
> and will outlive 3.4 so specifying it never went public is important.
>

Good catch.  You should be a PEP editor. <wink>

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/9efa6c6d/attachment-0001.html>