[py-dev] py.path import hook for getpymodule()

Tue Nov 9 00:17:43 CET 2004

Hi Armin! 

first of all, thanks for having the courage to tackle
improvents on the import system. It's a complicated 
thing with a large "no-fun" risk. 

[Armin Rigo Mon, Nov 08, 2004 at 06:26:41PM +0000]
>
> ... description of remote imports from e.g. subversion ... 
>
> This is done by putting in the __file__ attribute of the 'test_x' module a
> subclass of str, as suggested by Holger, which remembers the py.path.  A
> custom import hook detects the "import" statements coming from modules with
> such a __file__, and tries to resolve the import locally, relative to the
> py.path attached to the __file__.  If it succeeds, it calls getpymodule()
> again to import the new module.  Otherwise, it falls back to the standard
> import hook.

My worry is that "mixing" the standard  and the custom import
hook may lead to duplicate imports.  The underlying reason
seems is that the standard hook uses "dotted module paths" as keys
for the sys.modules cache whereas we use py.path's as keys.
This may result in importing a certain file twice and hilarity
ensues. 

So how could we avoid double imports? I am not quite sure.  
In the end, and IIRC you suggested something like this, we
might be able to reimplement enough of Python's import
logic which we need to do anyway for PyPy (and have done to a
certain degree already).  At the extreme end we would not 
invoke the original import hook at all.  But this may be 
too disruptive as we would break compatibility with other
custom import hooks. 

In the alternative, we may want to build a rather complete 
mapping of __file__->modules for all successfully imported 
modules, no matter if they were imported by our or by the
standard or even some other import hook. We intercept
all import statements anyway and could add successfully
imported modules to the mapping. 

Of course __import__('a.b.c', ...) could implicitely import a
chain of modules but after the standard import hook succeeds
our custom one could look at 'a', 'a.b', 'a.b.c' and put their
__file__s into the __file__->module mapping so that we don't import 
the thing again when called with its direct filesystem location. 

Like i said, i am not completly sure about all this but i think
a general __file__->module mapping might work well.  At least it
does from a theoretical viewpoint.  Btw, note how 
"inspect.getmodule(obj)" resorts to building such 
a mapping, too. And it doesn't even consider that there
may be multiple modules pointing to the same file! 

> The patch is attached for review (not checked in right now; depends if we want
> this kind of hack or not).  Note that it doesn't work if the modules being
> imported hack around sys.path themselves.  It works, though, if the hack is
> being performed by py.magic.autopath():

Yes, and manipulating sys.path directly often leads to incompatibility
with e.g.  zip-imports, anyway.  So apart from sys.argv[0] -
hacks i only see interesting use cases when you want to insert
paths relative to the package directory or your current location.  
For this, we might want to allow something like 

    extrapath = __file__.pkgdir.join('..', '..', 'something') 
    __file__.searchpaths.append(extrapath) 

Btw, i think it's fine to just put our extra attributes like
'pkgdir' and 'searchpaths' directly on the string sub-instance
instead of using magic __names__.  

Overall the patch looks good and it's nicely small but of
course a couple of tests are missing :-) I'd love to have the
double-import problem addressed somehow, though. 

> if called from a module with a special
> __file__, autopath() will not actually hack sys.path but just record the
> information it deduced on the path object.  Then the custom import hook
> notices this information...  Not too clean.  It does allow this kind of stuff
> to work, though:
> 
> stuff/__init__.py:
>     # empty
> 
> stuff/x.py:
>     def x1():
>         return 5
> 
> stuff/test_x.py:
>     import py; py.magic.autopath()
>     import stuff.x     # or "from stuff import x", which is harder
>     def test_x1():
>         assert stuff.x.x1() == 5

This makes sense to me.  It is indeed not completly clean but i 
don't see how you easily can get much cleaner than that. 

> Finally, note that (I'm not sure but I believe that) this custom import hook
> would not be needed with the new hooks of Python 2.3, which would allow a
> similar result in a cleaner way.  Of course that would be 2.3-specific.

And i only believe it when i see it :-) 

ciao, 

    holger