[Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"

P.J. Eby pje at telecommunity.com
Wed Jul 20 19:04:05 CEST 2011


At 08:56 AM 7/20/2011 -0700, Jeff Hardy wrote:
>On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby <pje at telecommunity.com> wrote:
> > The biggest likely exception to the above would be when a piece of
> > code tries to check whether some package is installed by importing
> > it.  If this is done *only* by importing a top-level module (i.e., not
> > checking for a ``__version__`` or some other attribute), *and* there
> > is a directory of the same name as the sought-for package on
> > ``sys.path`` somewhere, *and* the package is not actually installed,
> > then such code could *perhaps* be fooled into thinking a package is
> > installed that really isn't.
>
>This part worries me slightly. Imagine a program as such:
>
>datagen.py
>json/foo.js
>json/bar.js
>
>datagen.py uses the files in json/ to generate sample data for a
>database. In datagen.py is the following code:
>
>try:
>     import json
>except ImportError:
>     import simplejson as json
>
>Currently, this works just fine, but if will break (as I understand
>it) under the PEP because the json directory will become a virtual
>package and no ImportError will be raised.

Well, it won't fail as long if there actually *is* a json module or 
package on the path.  ;-)  But I do see your point.


>Is there a mitigation for this in the PEP that I've missed?

A possible mitigation would be to require that get_subpath() only 
return a directory name if that directory in fact contains importable 
modules somewhere.  This is actually discussed a bit later as an open 
issue under "Implementation Notes", indicating that iter_modules() 
has this issue as well.

The main open questions in doing this kind of checking have to do 
with recursion: it's perfectly valid to have say, a 'zc/' directory 
whose only content is a 'buildout/' subdirectory.

Of course, it still wouldn't help if the 'json/' subdirectory in your 
example did contain .py files.

There is another possibility, though:

What if we change the logic for pure-virtual package creation so that 
the parent module is created *if and only if* a child module is found?

In that case, trying to import a pure virtual 'zc' package would 
fail, but importing 'zc.buildout' would succeed as long as there was 
a zc/buildout.py or a zc/buildout/__init__.py somewhere.

And in your example, 'import json' would fail -- which is to say, succeed.  ;-)

This is a minor change to the spec, though perhaps a bit hairier to 
implement in practice.

The current import.c loop over the module name parts (iterating over 
say, 'zc', then 'buildout', and importing them in turn) would need to 
be reworked so that it could either roll back the virtual package 
creation in the event of sub-import failure or conversely delay 
creation of the parent package(s) until a sub-import finds a module.

I certainly think it's *doable*, mind you, but I'd hate to have to do 
it in C.  ;-)

Hm.  Here's another variant that might be easier to implement (even 
in C), and could offer some other advantages as well.

Suppose we replace the sys.virtual_packages set() with a 
sys.virtual_paths dict(): a dictionary that maps from module names to 
__path__ lists, and that's populated by the __path__ creation 
algorithm described in the PEP.  (An empty list would mean that 
__path__ creation failed for that module/package name.)

Now, if a module doesn't have a __path__ (or doesn't exist), we look 
in sys.virtual_paths for the module name.  If the retrieved list is 
empty, we fail the import.  If it's not, we proceed...  but *don't* 
create a module or set the existing module's __path__.

Then, at the point where an import succeeds, and we're going to set 
an attribute on the parent module, we recursively construct parent 
modules and set their __path__ attributes from sys.virtual_paths, if 
a module doesn't exist in sys.path, or its __path__ isn't set.

Voila.  Now there are fewer introspection problems as well: trying to 
'import json.foo' when there's no 'foo.py' in any json/ directory 
will *not* create an empty 'json' package in sys.modules as a 
side-effect.  And it won't add a __path__ to the 'json' module if 
there were a json.py found, either.

What's more, since importing a pure virtual package now fails unless 
you've successfully imported something from it before, it makes more 
sense for it to not have a __file__, or a __file__ of None.

Actually, it's too bad that we have to have parent packages in 
sys.modules, or I'd suggest we just make pure virtual packages 
unimportable, period.

Technically, we *could* always create dummy parent modules for 
virtual packages and *not* put them in sys.modules, but I'm not sure 
if that's a good idea.  It would be more consistent in some ways with 
the idea that virtual packages are not directly importable, but an 
interesting side effect would be that if module A does:

   import foo.bar

and module B does:

   import foo.baz

Then module A's version of 'foo' has *only* a 'bar' attribute and B's 
version has *only* a 'baz' attribute.  This could be considered a 
good thing, a bad thing, or a weird thing, depending on how you look 
at it.  ;-)

Probably, we should stick with the current shared 'foo' instance, 
even for pure virtual packages.  It's just that 'foo' should not 
exist in sys.packages until one of the above imports succeeds.

Anyway, thanks for bringing this issue up, because now we can fix the 
hole *entirely*.  If pure virtual packages can never be imported 
directly, then they can *never* create false positive imports -- and 
the "Backward Compatibility" part of the PEP gets shorter.  ;-)

Hurray!  (I'm tempted to run off and tweak the PEP for this right 
now, but I want to see if any of the folks who'd be doing the actual 
3.x implementation of this want to weigh in on the details first.)



More information about the Python-Dev mailing list