[Import-SIG] PEP 451 (ModuleSpec) round 3

Nick Coghlan ncoghlan at gmail.com
Fri Aug 30 16:41:03 CEST 2013


On 29 August 2013 23:16, Brett Cannon <brett at python.org> wrote:
>> Regardless, that I left this as a comment reflects my uncertainty of its
>> utility.
>
>
> There is lowering the barrier of entry and then there is adding a needless
> API. While I appreciate wanting to make the import system more accessible,
> you can't paint over sys.modules entirely, so trying to partially hide it
> here won't do anyone any good when they have to deal with it in other places
> (at least if they are working at the API level of needing to care if
> something is loaded).

Agreed on this one.

>>>>  In the case where loaders already expose methods for creating
>>>> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on
>>>> a throw-away module to create the appropriate spec.
>>>
>>>
>>> Why is the module a throw-away one? And why would loaders need to
>>> construct a ModuleSpec?
>>
>>
>> This is something the Nick pointed out.  Some loaders may already have the
>> API to create a module and populate its attributes.
>
>
> The key word here is "may". You simply cannot guess at needs of users
> without explicit evidence this is actually actively true in the wild. Even
> using importlib as an example is iffy since that's mostly my opinion of how
> to do things and won't necessarily hold (e.g. the namespace classes Eric
> wrote don't check their values at all while all the classes I wrote verify
> their name).
>
> Trust me, you don't want to end up supporting an API that only one person
> uses. Keep this initial API small and to the point and expand it as
> necessary or as requests come in for future releases. While we should aim to
> get the core concepts right the first time, we can expand the API later as
> necessary. There will be more Python releases after all. =)

Yeah, I had my sequence of events wrong when I suggested this one (I
was thinking about the create/exec split when loading a module).
Existing finders deal in loaders, not already constructed modules, so
this doesn't actually make sense.

Let's make it go away :)

>> As with create_module(), Loader.exec_module() isn't in charge of setting
>> the import-related attributes (as opposed to Loader.load_module(), which
>> is).  However, if the module was replaced in sys.modules during
>> exec_module(), then we assume that the one in sys.modules has not been
>> touched by the spec.  So we set any of the import-related attributes on it
>> that aren't already set (respecting the ones that are), with the exception
>> of __spec__, which we always override.
>>
>> Thinking about it, it may make sense in that case to create a new spec
>> based on the current one but deferring to any of the existing
>> import-related attributes of the module.  Then that spec can be set to
>> __spec__.
>
>
> If you do that then you are starting to shift to either a loader method or a
> function somewhere in importlib, else you are creating a conditional factory
> for specs.

I still like the idea of keeping __spec__ as "this is how we found it
originally", with only the module attributes reflecting any dynamic
updates.

>>> Since exec_module() is going to need to set these anyway for proper
>>> exec() use during loading then why are you setting them *again* later on?
>>> Should you set these first and then let the methods reset them as they see
>>> fit? I thought exec_module() took in a filled-in module anyway, so didn't
>>> you have to set all the attributes prior to passing it in anyway in step
>>> 1.a? In that case this is a reset which seems wrong if code explicitly chose
>>> to change the values.
>>
>>
>> It's not that code chose to change the values.  It's that code chose to
>> stick some other object in sys.modules and we're going to return it and we
>> want to be sure the spec and import-related attributes are all properly set.
>
>
> Do we though? We do that currently except for __package__ and __loader__,
> but that's for backwards-compatibility since those attributes are so new.
> This also destroys the possibility of lazy loading, btw (which I
> unfortunately realized after 3.3 was released, so I kind of regret it).
>
> Since the attributes have to be set before any exec() call I understand
> making sure they are set at that point, but if you're going to shove some
> other object in sys.modules then I view it as your problem to make sure you
> set those attributes as you are already deviating from normal import
> semantics. IOW if you are given a module you can and should expect it is as
> prepared as the import system can make it to be executed on, but if you
> choose to ignore it then you are on your own.

+1 for leaving the object in sys.modules alone after we call exec_module.

> You've abstracted out what method is called, which should mean I don't have
> to care which method is used (in case both are defined by a very compatible
> loader). But in this case I do are as one will cache its results and the
> other won't. Since sys.modules is such a core data structure you should
> always know when it has been tampered with, but these semantics make it
> unclear based on the the arguments used as to what the result will be in
> regards to implicit side-effects.
>
> I'm going to argue that you should have unified side-effects in regards to
> sys.modules so exec_module() should set sys.modules if it was not done so by
> the loader. If you want you can add a keyword-only argument to make that
> "optional" (i.e. explicitly remove from sys.modules or just don't explicit
> set it but whatever the loader did it did) or vice-versa, but there should
> be some way for me to be able to say "I don't want any surprises in regards
> to sys.modules, so be consistent regardless of loader method called".

No, the new methods are *supposed* to be stateless with no global side
effects, leaving all import state manipulation to the import system.
We just can't *promise* no side effects for exec_module, since the
module code might itself mess with the process global state. The fact
load_module is *expected* to mess with sys.modules is a design flaw
that shouldn't be perpetuated.

What I would really like is to be able to call
"target_loader.exec_module(existing_main_module)" from runpy,
*without* calling create_module first.

And now I remember why I wanted to be able to pass an existing module
to a loader and say "hey, can you use this as an execution
namespace?": so I could pass them __main__, instead of having to
hardcode knowledge of different kinds of loader into runpy.

So perhaps a better name might be "prepare_module" (by analogy to PEP
3115), and have it accept a "reloading" parameter, which is an
existing module to be reused.

The signature would be something like:

    def prepare_module(reloading=None):
        """Create a module object for execution. Returning None will
created a default module.

        If *reloading* is set, specifies an existing sys.modules entry
that is being reloaded. Must return None or that
        specific object if reloading is supported. Returning a
different module object or explicitly raising ImportError
        indicates that reloading is not supported. (Or perhaps define
a "ReloadError" subclass of ImportError?)
        """

>>> Does this need to be separate from load()? If you merge it in then the
>>> sys.modules semantics are unified within load(). Otherwise you need to make
>>> this set sys.modules in either case and return from sys.modules.
>>
>>
>> Both load() and reload() call exec().  In the case of load(), it wraps the
>> exec() call with the requisite sys.modules handling.  In the case of
>> reload(), the module should already be in sys.modules.  Regardless,
>> Loader.load_module() is already required to do all the sys.modules handling
>> so that base should be covered.  If we deprecate that requirement, which we
>> could, then we have a different story.  If someone calls exec() directly
>> before a module is ever loaded then the sys.modules handling shouldn't
>> matter anyway;
>
>
> I disagree: it should and will matter as side-effects of setting sys.modules
> influence import.

load_module should remain responsible for manipulating import state
directly, the new methods should explicitly discourage it (but we
can't rule it out for exec_module due to the invocation of arbitrary
user code).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Import-SIG mailing list