[Python-Dev] Implementing PEP 382, Namespace Packages

Sun May 30 00:44:28 CEST 2010

On Sat, May 29, 2010 at 12:29, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Am 29.05.2010 21:06, schrieb P.J. Eby:
>>
>> At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
>>>>
>>>> In it he says that PEP 382 is being deferred until it can address PEP
>>>> 302 loaders. I can't find any follow-up to this. I don't see any
>>>> discussion in PEP 382 about PEP 302 loaders, so I assume this issue was
>>>> never resolved. Does it need to be before PEP 382 is implemented? Are we
>>>> wasting our time by designing and (eventually) coding before this issue
>>>> is resolved?
>>>
>>> Yes, and yes.
>>
>> Is there anything we can do to help regarding that?
>
> You could comment on the proposal I made back then, or propose a different
> solution.

[sorry for the fundamental PEP questions, but I think PEP 382 came
about while I was on my python-dev sabbatical last year]

I have some questions about the PEP which might help clarify how to
handle the API changes.

For finders, their search algorithm is changed in a couple of ways.
One is that modules are given priority over packages (is that
intentional, Martin, or just an oversight?). Two, the package search
requires checking for a .pth file on top of an __init__.py. This will
change finders that could before simply do an existence check on an
__init__ "file" (or whatever the storage back-end happened to be) and
make it into a list-and-search which one would hope wasn't costly, but
in same cases might be if the paths to files is not stored in a
hierarchical fashion (e.g. zip files list entire files paths in their
TOC or a sqlite3 DB which uses a path for keys will have to list
**all** keys, sort them to just the relevant directory, and then look
for .pth or some such approach). Are we worried about possible
performance implications of this search? I say no, but I just want to
make sure people we are not and people are aware about the design
shift required in finders. This entire worry would be alleviated if
only .pth files named after the package were supported, much like
*.pkg files in pkgutil.

And then the search for the __init__.py begins on the newly modified
__path__, which I assume ends with the first __init__ found on
__path__, but if no file is found it's okay and essentially an empty
module with just module-specific attributes is used? In other words,
can a .pth file replace an __init__ file in delineating a package? Or
is it purely additive? I assume the latter for compatibility reasons,
but the PEP says "a directory is considered a package if it **either**
contains a file named __init__.py, **or** a file whose name ends with
".pth"" (emphasis mine). Otherwise I assume that the search will be
done simply with ``os.path.isdir(os.path.join(sys_path_entry,
top_level_package_name)`` and all existing paths will be added to
__path__. Will they come before or after the directory where the *.pth
was found? And will any subsequent *.pth files found in other
directories also be executed?

As for how "*" works, is this limited to top-level packages, or will
sub-packages participate as well? I assume the former, but it is not
directly stated in the PEP. If the latter, is a dotted package name
changed to ``os.sep.join(sy_path_entry, package_name.replace('".",
os.sep)``?

For sys.path_hooks, I am assuming import will simply skip over passing
that as it is a marker that __path__ represents a namsepace package
and not in any way functional. Although with sys.namespace_packages,
is leaving the "*" in __path__ truly necessary?

For the search of paths to use to extend, are we limiting ourselves to
actual file system entries on sys.path (as pkgutil does), or do we
want to support other storage back-ends? To do the latter I would
suggest having a successful path discovery be when a finder can be
created for the hypothetical directory from sys.path_hooks.

OK, I *think* that's all of my clarification questions when it comes
to the PEP. =) Now, on to API discussion.

The PEP (seems to) ask finders to look for a .pth file(s), calculate
__path__, and then get a loader for the __init__. You could have
finders grow a find_namespace method which returns the contents of the
requisite .pth file(s). Import could then take that, calculate
__path__, and then use that new search path to find a loader for the
__init__ (I am assuming there is an __init__ file somewhere). That's
straight-forward and makes supporting .pth files additive for finders.

The trick then becomes how the heck you get the new __path__ value
into the module through the loader as up to this point it has
calculated __path__ on its own. You could slightly abuse load_module's
semantics for reloading and stick the namespace module into
sys.modules before calling the loader for __init__ and change the
semantics definition such that if __path__ is already defined you
don't change it. Unfortunately that seems rather messy in the face of
reloads that want a fresh __path__.

Another possibility is to have the loader add the new paths, but to
provide the calculated value of __path__ be stored on
sys.namespace_packages. That way the loader can simply calculate its
own version and extend it with what the dictionary provides. This
allows loaders get what import thinks __path__ should be and they
still have a chance to tweak things. If you want even more abstraction
I would change is_package to return what __path__ should be when it is
a package and provide an ABC that does the proper calculation of the
extended __path__ value for is_package() so they can do ``return
[extras].extend(super().is_package())`` for packages.

But unfortunately, because load_module is overloaded with
responsibilities, there is no way to dynamically add support for any
of this to existing loaders like there is with finders (unless we
factor out the responsibilities of load_module so it isn't so
overworked and is entirely optional to implement, but that goes beyond
this PEP's scope). There is also the issue of reloading with this
delineation of work since finders are not necessarily called by
imp.reload. Otherwise the loaders will have to recalculate everything
import calculates in order to find the __init__ module to begin with.

The only other option I can think of is to tweak find_module to always
take a path argument, not just meta path finders. Then the calculated
__path__ value can be passed in through find_module and thus passed on
to the loader through a constructor or some such. That doesn't
duplicate the work of calculating the extended __path__ value in both
the finder or loader, nor having to cache it somewhere outside of the
importer's reach where it might go stale. The finder simply passed the
__path__ value on to the loader however it wants (most likely through
a constructor call or internal caching). This also acts as a
performance perk when searching for the __init__ module as having the
'path' argument set can act as a flag to not look for a module but
only a package. This would make it no longer an additive feature to
finders, but wouldn't require anything to change in loaders directly.

I'll shut up now and stop causing trouble. =)