[Python-Dev] Implementing PEP 382, Namespace Packages

Sun May 30 00:56:10 CEST 2010

At 09:29 PM 5/29/2010 +0200, Martin v. Löwis wrote:
>Am 29.05.2010 21:06, schrieb P.J. Eby:
>>At 08:45 PM 5/29/2010 +0200, Martin v. Löwis wrote:
>>>>In it he says that PEP 382 is being deferred until it can address PEP
>>>>302 loaders. I can't find any follow-up to this. I don't see any
>>>>discussion in PEP 382 about PEP 302 loaders, so I assume this issue was
>>>>never resolved. Does it need to be before PEP 382 is implemented? Are we
>>>>wasting our time by designing and (eventually) coding before this issue
>>>>is resolved?
>>>
>>>Yes, and yes.
>>
>>Is there anything we can do to help regarding that?
>
>You could comment on the proposal I made back then, or propose a 
>different solution.

Looking at that proposal, I don't follow how changing *loaders* (vs. 
importers) would help.  If an importer's find_module doesn't natively 
support PEP 382, then there's no way to get a loader for the package 
in the first place.  Today, namespace packages work fine with PEP 302 
loaders, because the namespace-ness is really only about setting up 
the __path__, and detecting that you need to do this in the first place.

In the PEP 302 scheme, then, it's either importers that have to 
change, or the process that invokes them.  Being able to ask an 
importer the equivalents of os.path.join, listdir, and get_data would 
suffice to make an import process that could do the trick.

Essentially, you'd ask each importer to first attempt to find the 
module, and then asking it (or the loader, if the find worked) 
whether packagename/*.pth exists, and then processing their contents.

I don't think there's a need to have a special method for executing a 
package __init__, since what you'd do in the case where there are 
.pth but no __init__, is to simply continue the search to the end of 
sys.path (or the parent package __path__), and *then* create the 
module with an appropriate __path__.

If at any point the find_module() call succeeds, then subsequent 
importers will just be asked for .pth files, which can then be 
processed into the __path__ of the now-loaded module.

IOW, something like this (very rough draft):

     pth_contents = []
     module = None

     for pathitem in syspath_or_parent__path__:

         importer = pkgutil.get_importer(pathitem)
         if importer is None:
             continue

         if module is None:
             try:
                 loader = importer.find_module(fullname)
             except ImportError:
                 pass
             else:
                 # errors here should propagate
                 module = loader.load_module(fullname)
                 if not hasattr(module, '__path__'):
                     # found, but not a package
                     return module

         pc = get_pth_contents(importer)
         if pc is not None:
             subpath = os.path.join(pathitem, modulebasename)
             pth_contents.append(subpath)
             pth_contents.extend(pc)
             if '*' not in pth_contents:
                 # got a package, but not a namespace
                 break

     if pth_contents:
         if module is None:
             # No __init__, but we have paths, so make an empty package
             module = # new module object w/empty __path__
         modify__path__(module, pth_contents)

     return module

Obviously, the details are all in the 'get_pth_contents()', and 
'modify__path__()' functions, and the above process would do extra 
work in the case where an individual importer implements PEP 382 on 
its own (although why would it?).

It's also the case that this algorithm will be slow to fail imports 
when implemented as a meta_path hook, since it will be doing an extra 
pass over sys.path or the parent __path__, in addition to the one 
that's done by the normal __import__ machinery.  (Though that's not 
an issue for Python 3.x, since this can be built into the core __import__).

(Technically, the 3.x version should probably ask meta_path hooks for 
their .pth files as well, but I'm not entirely sure that that's a 
meaningful thing to ask.)

The PEP 302 questions all boil down to how get_pth_contents() is 
implemented, and whether 'subpath' really should be created with 
os.path.join.  Simply adding a get_pth_contents() method to the 
importer protocol (that returns None or a list of lines), and maybe a 
get_subpath(modulename) method that returns the path string that 
should be used for a subdirectory importer (i.e. __path__ entry), or 
None if no such subpath exists.

Adding finer-grained methods is probably a waste of time, as there 
aren't likely to be many use cases for asking an *importer* to fetch 
files (vs. a loader).

(In my case, of course, I'd use the pkgutil-style approach of 
augmenting importers or loaders that don't natively implement a 
needed method, that still allows third parties to register their own 
support for a fourth party's loader or importer type.)