[Import-SIG] PEP 451 (ModuleSpec) round 3

Thu Aug 29 15:16:54 CEST 2013

On Wed, Aug 28, 2013 at 9:57 PM, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> On Wed, Aug 28, 2013 at 11:22 AM, Brett Cannon <brett at python.org> wrote:
>
>> On Wed, Aug 28, 2013 at 4:50 AM, Eric Snow <ericsnowcurrently at gmail.com>wrote:
>>
>>> This PEP proposes to add a new class to ``importlib.machinery`` called
>>> ``ModuleSpec``.  It will be authoritative for all the import-related
>>> information about a module, and will be available without needing to
>>> load the module first.  Finders will provide a module's spec instead of
>>> a loader.
>>>
>>
>> Don't you mean finders will return a ModuleSpec? Since 'loader' is still
>> defined in the ModuleSpec to know what loader to use that statement that
>> finders don't provide a loader is misleading.
>>
>
> Yeah, finders will still create the loaders.  I'll make that more clear.
>
>  importlib.machinery.ModuleSpec (new)
>>>
>> ------------------------------------
>>>
>>
>> For this entire section you need to provide the call signatures as you
>> start talking semantics later w/o making clear what is being passed and
>> returned before going into detail of the individual methods. Otherwise move
>> the detailed discussion of the methods up to before the semantics overview.
>>
>
> I'll try moving the detailed descriptions up.
>
>
>>
>>
>>>
>>> Attributes:
>>>
>>> * name - a string for the name of the module.
>>> * loader - the loader to use for loading and for module data.
>>> * origin - a string for the location from which the module is loaded.
>>>
>>
>> I would give an "e.g." here to help explain what you mean. As previous
>> comments have shown, the name alone is not enough to understand what value
>> should go here. =)
>>
>
> Good point. :)
>
>
>>
>>
>>> * submodule_search_locations - strings for where to find submodules,
>>>   if a package.
>>> * loading_info - a container of data for use during loading (or None).
>>> * cached (property) - a string for where the compiled module will be
>>>   stored.
>>> * is_location (RO-property) - the module's origin refers to a location.
>>>
>>> .. XXX Find a better name than loading_info?
>>>
>>
>>  loading_data is all that I can think of
>>
>>
>>>  .. XXX Add ``submodules`` (RO-property) - returns possible submodules
>>>    relative to spec (or None)?
>>>
>>
>> Actual use-case or are you just guessing there will be a use? Don't add
>> any fields that we have not seen an actual need for.
>>
>
> I was thinking of what Nick said about downplaying the module/package
> distinction, since a package is just a module with possible submodules.  So
> then I thought, what if there were an easy way to see what submodules a
> module has available?  Non-packages would always have 0 and packages would
> have 0 or more.  I'd use that if it existed.
>
> This was mostly a passing idea that would need more thought (the
> implementation might be tricky).  I agree it doesn't need to be in the PEP.
>
>
>>
>>> .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any?
>>>
>>
>> Too easy to figure out with ``name in sys.modules`` and can go stale
>> (unless you make this a property).
>>
>
> This was a light attempt at lowering the barrier to entry with regards to
> the import system.  I was thinking of how you have to know to look in
> sys.modules to see if the module is loaded.  Providing a property
> effectively hides sys.modules as an implementation detail.  I was also
> thinking of this as "installed".
>
> Regardless, that I left this as a comment reflects my uncertainty of its
> utility.
>

There is lowering the barrier of entry and then there is adding a needless
API. While I appreciate wanting to make the import system more accessible,
you can't paint over sys.modules entirely, so trying to partially hide it
here won't do anyone any good when they have to deal with it in other
places (at least if they are working at the API level of needing to care if
something is loaded).

>
>
>>  Python users will be able to inspect a module's ``__spec__`` to get
>>> import-related information about the object.  Generally, they will not
>>> be using the ``ModuleSpec`` factory methods nor the instance methods.
>>>
>>
>> As of right now no one is using the instance methods based on the wording
>> in this section. =)
>>
>
> Yeah, that would read better as something like "Generally, Python
> applications and interactive users will not...".
>
>
>>
>>> However, each spec has methods named ``create``, ``exec``, ``load``, and
>>> ``reload``.  Since they are so easy to access (and misunderstand/abuse),
>>> their function and availability require explicit consideration in this
>>> proposal.
>>>
>>>
>>> What Will Existing Finders and Loaders Have to Do Differently?
>>> ==============================================================
>>>
>>> Immediately?  Nothing.  The status quo will be deprecated, but will
>>> continue working.  However, here are the things that the authors of
>>> finders and loaders should change relative to this PEP:
>>>
>>> * Implement ``find_spec()`` on finders.
>>> * Implement ``exec_module()`` on loaders, if possible.
>>>
>>> The factory methods of ``ModuleSpec`` are intended to be helpful for
>>> converting existing finders.  ``from_loader()`` and
>>> ``from_file_location()`` are both straight-forward utilities in this
>>> regard.
>>>
>>
>> If this holds to be true then they should go into importlib.util and kept
>> out of the general object since dir(module_spec) shouldn't need to show the
>> methods indefinitely.
>>
>
> I've actually been vacillating for days between classmethods and
> importlib.util, and at one point I even made a meta class and put the
> factories there (to keep them out of instances).  At the moment
> importlib.util is making more sense.
>
>
>>
>>
>>>  In the case where loaders already expose methods for creating
>>> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on
>>> a throw-away module to create the appropriate spec.
>>>
>>
>> Why is the module a throw-away one? And why would loaders need to
>> construct a ModuleSpec?
>>
>
> This is something the Nick pointed out.  Some loaders may already have the
> API to create a module and populate its attributes.
>

The key word here is "may". You simply cannot guess at needs of users
without explicit evidence this is actually actively true in the wild. Even
using importlib as an example is iffy since that's mostly my opinion of how
to do things and won't necessarily hold (e.g. the namespace classes Eric
wrote don't check their values at all while all the classes I wrote verify
their name).

Trust me, you don't want to end up supporting an API that only one person
uses. Keep this initial API small and to the point and expand it as
necessary or as requests come in for future releases. While we should aim
to get the core concepts right the first time, we can expand the API later
as necessary. There will be more Python releases after all. =)

>  In that case, the finder could use that API to get the module and then
> use ModuleSpec.from_module() to create the spec that find_spec() would
> return.  This is very explicit and direct way to map the existing
> import-related info for the module onto a spec.
>
>
>>
>>
>>>
>>> As for loaders,
>>>
>>
>> You were just talking about loader, so this is a bad transition.
>>
>
> Yeah, that did get muddled.  I was talking about how finders could use the
> existing capabilities of loaders to build a spec.  I'll make that less
> awkward.
>
>
>> This is an outline of what happens in ``ModuleSpec.load()``.
>>>
>>> 1. A new module is created by calling ``spec.create()``.
>>>
>>>    a. If the loader has a ``create_module()`` method, it gets called.
>>>       Otherwise a new module gets created.
>>>    b. The import-related module attributes are set.
>>>
>>
>> So it seems step (b) happens even if step (a) does. If that's the case
>> then are attributes overridden blindly, or conditionally set? If (b)
>> doesn't happen if (a) did then you need to make that clear.
>>
>
> Yeah, (b) always happens.  I was planning on having them be overridden
> blindly to match the spec.  Loader.create_module() would not be responsible
> for setting the import-related attributes.
>
>
>>
>>
>>>
>>> 2. The module is added to sys.modules.
>>>
>>
>> I would add a note that there is a separate method for handling reloads
>> and thus blindly setting sys.modules is acceptable.
>>
>
> Good point.  Furthermore, if the module exists in sys.modules when load()
> gets called and it fails, the module will be removed from sys.modules.  Do
> you think it would make sense to stick the original module back into
> sys.modules in that case?  I don't because calling load() may have
> side-effects on the state of that original module.
>

Fine by me. Does document pre-existing values in sys.modules are *not*
taken into consideration; if you want that then use reload().

>
>
>>
>>
>>> 3. ``spec.exec(module)`` gets called.
>>>
>>>    a. If the loader has an ``exec_module()`` method, it gets called.
>>>       Otherwise ``load_module()`` gets called for backward-compatibility
>>>       and the resulting module is updated to match the spec.
>>>
>>
>> "resulting module found in sys.modules is".
>>
>> And I think you meant to make step (b) be the fallback to load_module().
>>
>
> I was thinking of it as one step with two possible paths, rather than 2
> steps.
>

Then it's just step 3 since there is no (b) step.

>
>
>>
>>
>>>
>>> 4. If there were any errors the module is removed from sys.modules.
>>> 5. If the module was replaced in sys.modules during ``exec()``, the one
>>>    in sys.modules is updated to match the spec.
>>>
>>
>> This doesn't make sense. You just said the module got updated to match
>> the spec in step 3.a. Are you saying you're going to overwrite values that
>> exec_module() set? And once again, blindly updating or conditionally? And
>> how are these attributes being set?
>>
>
> As with create_module(), Loader.exec_module() isn't in charge of setting
> the import-related attributes (as opposed to Loader.load_module(), which
> is).  However, if the module was replaced in sys.modules during
> exec_module(), then we assume that the one in sys.modules has not been
> touched by the spec.  So we set any of the import-related attributes on it
> that aren't already set (respecting the ones that are), with the exception
> of __spec__, which we always override.
>
> Thinking about it, it may make sense in that case to create a new spec
> based on the current one but deferring to any of the existing
>  import-related attributes of the module.  Then that spec can be set to
> __spec__.
>

If you do that then you are starting to shift to either a loader method or
a function somewhere in importlib, else you are creating a conditional
factory for specs.

>
>
>>  Since exec_module() is going to need to set these anyway for proper
>> exec() use during loading then why are you setting them *again* later on?
>> Should you set these first and then let the methods reset them as they see
>> fit? I thought exec_module() took in a filled-in module anyway, so didn't
>> you have to set all the attributes prior to passing it in anyway in step
>> 1.a? In that case this is a reset which seems wrong if code explicitly
>> chose to change the values.
>>
>
> It's not that code chose to change the values.  It's that code chose to
> stick some other object in sys.modules and we're going to return it and we
> want to be sure the spec and import-related attributes are all properly set.
>

Do we though? We do that currently except for __package__ and __loader__,
but that's for backwards-compatibility since those attributes are so new.
This also destroys the possibility of lazy loading, btw (which I
unfortunately realized after 3.3 was released, so I kind of regret it).

Since the attributes have to be set before any exec() call I understand
making sure they are set at that point, but if you're going to shove some
other object in sys.modules then I view it as your problem to make sure you
set those attributes as you are already deviating from normal import
semantics. IOW if you are given a module you can and should expect it is as
prepared as the import system can make it to be executed on, but if you
choose to ignore it then you are on your own.

>
>
>>
>>
>>> 6. The module in sys.modules is returned.
>>>
>>
>> Or you can just provide the pseudo-code and skip all of this explanation
>> and be easier to follow =) You can leave comments with step numbers if you
>> want to expound upon any specific step outside of the pseudo-code:
>>
>
> That is a really good idea.  It will make more sense.
>

I know it will help me follow exactly what you are planning to do. It isn't
like these methods are that complicated or long.

>
>
>>
>> class ModuleSpec:
>>
>>   def load(self):
>>     module = self.create()
>>     sys.modules[self.name] = module
>>
>>      try:
>>       self.exec(module)
>>     except:
>>       try:
>>         del sys.modules[self.name]
>>       except KeyError:
>>         pass
>>     else:
>>       # XXX different from proposal: didn't reset attributes
>>       return sys.modules[self.name]
>>
>>   def create(self):
>>      if hasattr(self.loader, 'create_module'):
>>       module = self.loader.create_module(self)
>>     else:
>>       module = types.ModuleType(self.name)
>>        # XXX different from proposal: didn't do it blindly after
>> create_module()
>>       self.init_module_attrs(module)
>>     return module
>>
>>   def exec(self, module):
>>     if hasattr(self.loader, 'exec_module'):
>>       self.loader.exec_module(module)
>>     elif hasattr(self.loader, 'load_module'):
>>       self.loader.load_module(self.name)
>>       module = sys.modules[self.name]
>>     else:
>>       raise TypeError('{!r} loader does not have an ' +
>>                       'exec_module or load_module
>> method'.format(self.loader))
>>     return module
>>
>> ...
>>
>>
>>>  ========================== ===========
>>>
>> On ModuleSpec              On Modules
>>> ========================== ===========
>>> name                       __name__
>>> loader                     __loader__
>>> package                    __package__
>>> origin                     __file__*
>>> cached                     __cached__*
>>>
>>
>> This shouldn't be set on extension modules, so this is another asterisk
>> of has_location *and* is not None (right?).
>>
>
> Correct.  Good point.
>
>
>>
>>
>>> submodule_search_locations __path__**
>>> loading_info                \-
>>> has_location (RO-property)  \-
>>> ========================== ===========
>>>
>>> \* Only if ``is_location`` is true.
>>>
>>
>> Should that be has_location?
>>
>
> Yep. :)
>
>
>>  **has_location**
>>>
>>> .. container::
>>>
>>>    Some modules can be loaded by reference to a location, e.g. a
>>> filesystem
>>>    path or a URL or something of the sort.  Having the location lets you
>>>    load the module, but in theory you could load that module under
>>> various
>>>    names.
>>>
>>>    In contrast, non-located modules can't be loaded in this fashion, e.g.
>>>    builtin modules and modules dynamically created in code.  For these,
>>> the
>>>    name is the only way to access them, so they have an "origin" but not
>>> a
>>>    "location".
>>>
>>>    This attribute reflects whether or not the module is locatable.  If it
>>>    is, ``origin`` must be set to the module's location and ``__file__``
>>>    will be set on the module.  Furthermore, a locatable module is also
>>>    cacheable and so ``__cached__`` is tied to ``has_location``.
>>>
>>
>> That statement about __cached__ is not true for extension modules. You're
>> going to need to tweak how you define 'cached' based on this. Either that
>> or you can try to use this as a justification for loader.create_module() as
>> you can override these semantics there as a pure Python module is more
>> common than extension modules (although this doesn't help with the
>> ModuleSpec having the wrong information when returned from the finder
>> unless the finder itself resets it on the ModuleSpec before returning it).
>>
>
> Yeah, I'll need to rework that.
>
>
>>  **create()**
>>>
>>> .. container::
>>>
>>>    A new module is created relative to the spec and its import-related
>>>    attributes are set accordingly.  If the spec's loader has a
>>>    ``create_module()`` method, that gets called to create the module.
>>>  This
>>>    give the loader a chance to do any pre-loading initialization that
>>> can't
>>>    otherwise be accomplished elsewhere.  Otherwise a bare module object
>>> is
>>>    created.  In both cases ``init_module_attrs()`` is called on the
>>> module
>>>    before it gets returned.
>>>
>>
>> As stated earlier, I don't like the idea of blindly resetting attributes
>> if set by create_module().
>>
>
> Well, create_module() shouldn't be setting them.  Are you suggesting that
> there is a use case for that?
>

I'm saying you need to very clearly state whose job it is to set all of the
attributes on the module and whether they are blindly set or conditionally
and based on what test to know if they are already set (e.g. None vs. the
attribute not existing).

>
>
>>
>>
>>>
>>> **exec(module)**
>>>
>>> .. container::
>>>
>>>    The spec's loader is used to execute the module.  If the loader has
>>>    ``exec_module()`` defined, the namespace of ``module`` is the target
>>> of
>>>    execution.
>>>
>>
>> Wait, what? You suggest it's the module in the signature but
>> module.__dict__ in the explanation.
>>
>
> That's right.  The module is passed in and then exec_module() does its
> thing with module.__dict__.  Do you think exec_module() should directly
> take a dict instead?
>

No, not at all. The wording just wasn't clear to me as I took "target" to
mean "what I passed in", not what the point of the functionality was. I
would reword it to say "If the loader has ``exec_module()`` defined then it
is expected to execute the module's code on the passed-in module object" or
something.

>
>
>>
>>>  Otherwise the loader's ``load_module()`` is called, which
>>>    ignores ``module`` and returns the module that was the actual
>>>    execution target.
>>>
>>
>> Are you pulling from sys.modules? Otherwise how are you getting the
>> module  from load_module()?
>>
>
> load_module() returns the module.  For loaders that don't follow that rule
> (and return None), we'll grab the module out of sys.modules.
>
>
>>  And you don't mention that in one case the module is not put into
>> sys.modules while in the other case it is (exec_module vs. load_module).
>> That dichotomy is going to be messy.
>>
>
> That difference should definitely be clear.  What is messy about it?
>

You've abstracted out what method is called, which should mean I don't have
to care which method is used (in case both are defined by a very compatible
loader). But in this case I do are as one will cache its results and the
other won't. Since sys.modules is such a core data structure you should
always know when it has been tampered with, but these semantics make it
unclear based on the the arguments used as to what the result will be in
regards to implicit side-effects.

I'm going to argue that you should have unified side-effects in regards to
sys.modules so exec_module() should set sys.modules if it was not done so
by the loader. If you want you can add a keyword-only argument to make that
"optional" (i.e. explicitly remove from sys.modules or just don't explicit
set it but whatever the loader did it did) or vice-versa, but there should
be some way for me to be able to say "I don't want any surprises in regards
to sys.modules, so be consistent regardless of loader method called".

>
>
>>  Does this need to be separate from load()? If you merge it in then the
>> sys.modules semantics are unified within load(). Otherwise you need to make
>> this set sys.modules in either case and return from sys.modules.
>>
>
> Both load() and reload() call exec().  In the case of load(), it wraps the
> exec() call with the requisite sys.modules handling.  In the case of
> reload(), the module should already be in sys.modules.  Regardless,
> Loader.load_module() is already required to do all the sys.modules handling
> so that base should be covered.  If we deprecate that requirement, which we
> could, then we have a different story.  If someone calls exec() directly
> before a module is ever loaded then the sys.modules handling shouldn't
> matter anyway;
>

I disagree: it should and will matter as side-effects of setting
sys.modules influence import.

>  and if someone does that it means the spec did not come from
> module.__spec__ so they probably aren't a casual user.
>
>
>>
>>>  In that case the import-related attributes of that
>>>    module are updated to reflect the spec.
>>>
>>
>> Why? If you already set the attributes in the module and inserted it into
>> sys.modules previously then you already took care of this. Else you now are
>> setting the attributes potentially *three* times (twice in create() from
>> loader.create_module() + an explicit call to init_module_attr() and then
>> here).
>>
>
> Loader.load_module() is still responsible for setting those attributes.
>  However, it may have missed one or more (including __spec__).  We want to
> make sure all the appropriate import-related attributes get set.
>  Furthermore, load_module() may have overridden the values we set
> previously.
>
> Given its authority, it may make sense to update module.__spec__ to
> reflect the attributes set by the loader.  That way __spec__ indicates the
> values used during loading.  On the other hand, by not updating the spec,
> the difference between the module attributes and the spec will reflect the
> ways in which the loader did not follow the spec.  I've been following that
> former line of thinking, but now I'm wondering if the latter would be
> better.  Regardless, the pathological case where the module attributes set
> by load_module() and the spec don't match should be pretty rare.
>
> As to setting it multiple times, in the worst case the attributes will be
> set twice.  Loader.create_module() shouldn't be setting them.
>
>
>>
>>>  In both cases the targeted
>>>    module is the one that gets returned.
>>>
>>
>> Huh? What exactly are you returning? You say "actual execution target"
>> above for load_module() but "in both cases the target module" here. That
>> seems to contradictory.
>>
>
> In the load_module() case we return the result of calling load_module()
> (or the module in sys.modules if load_module() returns None).  Otherwise we
> return the module that was passed in.  In their respective cases both are
> the actual execution targets.
>
> I'll reword that.
>
>
>>
>>
>>>
>>> **load()**
>>>
>>> .. container::
>>>
>>>    This method captures the current functionality of and requirements on
>>>    ``Loader.load_module()`` without any semantic changes.  It is
>>>    essentially a wrapper around ``create()`` and ``exec()`` with some
>>>    extra functionality regarding ``sys.modules``.
>>>
>>>    itself in ``sys.modules`` while executing.  Consequently, the module
>>> in
>>>    ``sys.modules`` is the one that gets returned by ``load()``.
>>>
>>>    Right before ``exec()`` is called, the module is added to
>>>    ``sys.modules``.  In the case of error during loading the module is
>>>    removed from ``sys.modules``.  The module in ``sys.modules`` when
>>>    ``load()`` finishes is the one that gets returned.  Returning the
>>> module
>>>    from ``sys.modules`` accommodates the ability of the module to replace
>>>    itself there while it is executing (during load).
>>>
>>>    As already noted, this is what already happens in the import system.
>>>    ``load()`` is not meant to change any of this behavior.
>>>
>>>    If ``loader`` is not set (``None``), ``load()`` raises a ValueError.
>>>
>>
>> Since the loader is required by the initializer for ModuleSpec I don't
>> know if this specific check is necessary: EAFP.
>>
>
> Yeah, the check will almost always pass.  And if someone does something
> they shouldn't they'll get an AttributeError really quickly anyway.
>
>
>>  Open Issues
>>> ==============
>>>
>>> \* The impact of this change on pkgutil (and setuptools) needs looking
>>> into.  It has some generic function-based extensions to PEP 302.  These
>>> may break if importlib starts wrapping loaders without the tools'
>>> knowledge.
>>>
>>> \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
>>> inspect.
>>>
>>> \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the
>>> spec's loader?
>>>
>>
>> No. This starts to move this away from ModuleSpec modules being a data
>> storage object and more or a level of indirection around loaders.
>>
>
> Agreed.  It may be nice to have the easier access to the loader data APIs,
> but doesn't quite fit.  ModuleSpec.data as a wrapper was the best I could
> think of.
>
>
>>
>>
>>>
>>> \* How to limit possible end-user confusion/abuses relative to spec
>>> attributes (since __spec__ will make them really accessible)?
>>>
>>>
>>> References
>>> ==========
>>>
>>> [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html
>>>
>>>
>>> Copyright
>>> =========
>>>
>>> This document has been placed in the public domain.
>>>
>>> ..
>>>    Local Variables:
>>>    mode: indented-text
>>>    indent-tabs-mode: nil
>>>    sentence-end-double-space: t
>>>    fill-column: 70
>>>    coding: utf-8
>>>    End:
>>>
>>>
>>> _______________________________________________
>>> Import-SIG mailing list
>>> Import-SIG at python.org
>>> http://mail.python.org/mailman/listinfo/import-sig
>>>
>>>
>>
> Thanks for that review.  It helps a lot.  I'll update the PEP when I get a
> chance.  From your feedback I've gathered a few things:
>
> 1. The PEP needs to be more clear on what the Loader methods (both
> existing and new) are supposed to accomplish and their responsibilities
> regarding sys.modules and module attributes.
> 2. I need to slide more toward "do less" than I have been in the balance
> between keeping things simple and covering all the corner cases.
>

Yeah, if you miss something people will file a bug or speak up. =)
Remember, any API that goes in will theoretically need to be supported for
a **long** time. Expanding an API is much easier than contracting it.

> 3. Whether or not the spec should be updated to reflect the attributes set
> by load_module() still needs some consideration.
> 4. The purpose of exec() vs. load() needs some clarification (pseudo-code
> should help).
> 5. I need more sleep. :)
>

Ditto. =) If I at all came off as cranky it's because two days in a row I
needed to get up at 04:50.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130829/431bcb7f/attachment-0001.html>