[Import-SIG] Round 2 for "A ModuleSpec Type for the Import System"

Thu Aug 15 17:59:43 CEST 2013

On 12 Aug 2013 23:35, "Eric Snow" <ericsnowcurrently at gmail.com> wrote:
>
> On Sun, Aug 11, 2013 at 7:03 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> I think this is solid enough to be worth adding to the PEPs repo now.
>
>
> Sounds good.
>
>>
>>
>> On 9 August 2013 18:58, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>> > Here's an updated version of the PEP for ModuleSpec which addresses the
>> > feedback I've gotten.  Thanks for the help.  The big open question, to me,
>> > is whether or not to have a separate reload() method.  I'll be looking into
>> > that when I get a chance.  There's also the question of a path-based
>> > subclass, but I'm currently not convinced it's worth it.
>>
>> One piece of feedback from me (triggered by the C extension modules
>> discussion on python-dev): we should consider proposing a new "exec"
>> hook for C extension modules that could be defined instead of or in
>> addition to the existing PEP 3121 init hook.
>
>
> Sounds good.  I expect you mean as a separate proposal...

I actually meant in this proposal, but strictly speaking, I just need
the "create" part of the API at this level to tie into my ideas for C
extensions :)

Now that this has a PEP number I can reference, I'll try to get
something more fleshed out posted later this week.

>> > ModuleSpec
>> > ----------
>> >
>> > A new class which defines the import-related values to use when loading
>> > the module.  It closely corresponds to the import-related attributes of
>> > module objects.  ``ModuleSpec`` objects may also be used by finders and
>> > loaders and other import-related APIs to hold extra import-related
>> > state about the module.  This greatly reduces the need to add any new
>> > new import-related attributes to module objects, and loader ``__init__``
>> > methods won't need to accommodate such per-module state.
>>
>> To avoid conflicts as the spec attributes evolve in the future, would
>> it be worth having a "custom" field which is just an arbitrary object
>> reference used to pass info from the finder to the loader without
>> troubling the rest of the import system?
>
>
> I see what you're saying, but am conflicted.  For some reason providing a sub-namespace for that doesn't seem quite right.  However, the alternative runs the risk of collisions later on.  Maybe we could recommend the use of a preceding "_" for custom attributes?  I'll see if I can come up with something.

It wouldn't be a custom namespace, just a single attribute to pass
data to the loader. It could be a dict, namespace, string, custom
object, anything. By default, it would be None.

For example, zipimporter could use it to pass the zip archive name to
the loader directly, rather than needing to derive it from origin or
create a custom loader for each find operation.

>> > While ``package`` and ``is_package`` are read-only properties, the
>> > remaining attributes can be replaced after the module spec is created
>> > and after import is complete.  This allows for unusual cases where
>> > modifying the spec is the best option.  However, typical use should not
>> > involve changing the state of a module's spec.
>>
>> I'm with Brett that "is_package" should go, to be replaced by
>> "spec.path is not None" wherever it matters. is_package() would then
>> fall through to the PEP 302 loader API via __getattr__.
>
>
> I'm considering the recommendation, but I still feel like `is_package` as an attribute is worth having.  I see module.__spec__ as useful to more than the import system and its hackers, and `is_package` as a value to the broader audience that may not have learned about what __path__ means.  It's certainly not obvious that __path__ implies a package.  Then again, a person would have to be looking at __spec__ to see `is_package`, so maybe it loses enough utility to be worth keeping.

I think we need to emphasise the fact that a package is just a module
with a search path attribute *more* rather than less. Don't try to
hide it, shout it from the rooftops :)

Say, something like "spec.submodule_search_path is not None" :)

>> How about we *just* have origin, with a separate "set_fileattr"
>> attribute to indicate "this is a discrete file, you should set
>> __file__"?
>
>
> I like that.  I'll see how it works.  There doesn't seem to be any reason why you would have two distinct strings for origin and filename.  In fact, that's kind of smelly.
>
> However, I wonder if this is where a PathModuleSpec subclass would be meaningful.  Then no flag would be necessary.

I realised we may not need a separate flag at all: how about we key
this off "hasattr(self.loader, 'get_data')"? And expose that as a
"is_location" read-only property? (I like PJE's suggestion of
"location" as a name for modules which may be used with a loader that
supports the get_data API)

(Tangent: at some point in the future, we could define an "open"
method on spec objects. This would do the path munging relative to
origin automatically, using the opener argument to the builtin open to
back it with BytesIO and the get_data API on the loader. If loaders
defined an "opener" method, then the spec could use that instead)

>> > ModuleSpec Methods
>> > ------------------
>> >
>> > ``from_loader(name, loader, *, is_package=None, origin=None, filename=None,
>> > cached=None, path=None)``
>> >
>> > .. XXX use a different name?
>>
>> I'd disallow customisation on this one - if people want to customise,
>> they should just query the PEP 302 APIs themselves and call the
>> ModuleSpec constructor directly. The use case for this one should be
>> to make it trivial to switch from "return loader" to "return
>> ModuleSpec.from_loader(loader)" in a find_module implementation.
>
>
> What do you mean by disallow customization?  Make it "private"?  `from_loader()` is intended for exactly the use that you described.

The keyword arguments. If from_loader stays, it shouldn't allow you to
override the values derived from the loader - if you want to do that,
just read the values you want to keep from the loader and pass them in
explicitly.

>> A separate "from_module(m)" constructor would probably make sense, though.
>
> I have this for internal use in the implementation, but did not expose it since all modules should already have a spec.

It's more for the benefit of adapting existing loaders - since they
already have the code to initialise the module, we should make it easy
for them to just initialise a throwaway module and convert it to a
spec object, rather than having to completely rewrite their
initialisation code to be spec based.

>> > ``module_repr()``
>> >
>> > Returns a repr string for the module if ``origin`` is set and
>> > ``filename`` is not set.  The string refers to the value of ``origin``.
>> > Otherwise ``module_repr()`` returns None.  This indicates to the module
>> > type's ``__repr__()`` that it should fall back to the default repr.
>> >
>> > We could also have ``module_repr()`` produce the repr for the case where
>> > ``filename`` is set or where ``origin`` is not set, mirroring the repr
>> > that the module type produces directly.  However, the repr string is
>> > derived from the import-related module attributes, which might be out of
>> > sync with the spec.
>> >
>> > .. XXX Is using the spec close enough?  Probably not.
>>
>> I think it makes sense to always return the expected repr based on the
>> spec attributes, but allow a custom origin to be passed in to handle
>> the case where the module __file__ attribute differs from
>> __spec__.origin (keeping in mind I think __spec__.filename should be
>> replaced with __spec__.set_fileattr)
>
>
> That's the approach that I took at first, but the module that is passed in is not guaranteed to be a spec.  Furthermore, having the spec take precedence over the module's attrs for the repr seems like too big a backward-compatibility risk.

I don't understand your response. Simplifying the API a bit to allow a
module to be passed in directly, ModuleType.__repr__ would just call
it like this:

    self.__spec__.module_repr(self)

All the logic would be in one place (ModuleSpec), but modules could
still override the original values with the actual settings in the
module namespace.

>> > The implementation of the module type's ``__repr__()`` will change to
>> > accommodate this PEP.  However, the current functionality will remain to
>> > handle the case where a module does not have a ``__spec__`` attribute.
>>
>> Experience tells us that the import system should ensure the __spec__
>> attribute always exists (even if it has to be filled in from the
>> module attributes after calling load_module)
>
> That's a good point.  The only possible problem is for someone that creates their own module object and expects repr to work the same as it does currently.

Hmm, true - however, we can handle that by creating and throwing away
a dummy spec object rather than duplicating the logic.

>> We could also expose a "create" method that just creates and returns
>> the new module object, and replace importlib.util.module_to_load with
>> a context manager that accepted the module as a parameter. Say
>> "add_to_sys", which fails if the module is already present in
>> sys.modules.
>
>
> One of the points of ModuleSpec is to remove the need for `module_to_load()`.  I'm not convinced of the utility of a create method like you've described other than possibly as something internal to ModuleSpec.

Splitting create and exec should eventually let me delete a bunch of
code from runpy :)

Cheers,
Nick.