PyWart: "Python's import statement and the history of external dependencies"

Chris Angelico rosuav at gmail.com
Sat Nov 22 08:00:34 EST 2014


On Sat, Nov 22, 2014 at 11:25 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Ian Kelly wrote:
>
> - It's hard to keep track of what modules are in the standard library. Which
> of the following is *not* in Python 3.3's std lib? No cheating by looking
> them up.)
>
>     os2emxpath, wave, sndheader, statslib, poplist, plist,
>     pickletools, picklelib, path, cgi, cgitb, copylib, xpath

Okay, here's my guesses.

os2emxpath: In the stdlib, but more often accessed as "os.path" while
running under OS/2
wave: Not in the stdlib, though I'd avoid the name anyway.
sndheader: Not in the stdlib - probably on PyPI though
poplist, plist, pickletools, picklelib: I suspect PyPI, not stdlib,
but could be wrong
path: Not in the stdlib (there's os.path and I doubt there'd be both)
cgi, cgitb: In the stdlib
copylib: No idea, could be either way.
xpath: I'll guess this as not being present.

I'm probably pretty wrong, though.

>>> # Contrary to popular belief, sys.path is *NOT* a module,  #
>>> # no, it's a global!                                       #
>>
>> I really doubt that this is a popular belief.
>
> I'm not aware of anyone who believes that sys.path is a module.
> But yes, sys.path is not just global, but process-wide global. *All* modules
> share the same sys.path.

Even leaving aside Rick's sloppy language, I still doubt that it's
popular belief that sys.path be module-specific. You're modifying
something in a different module, and Python's always maintained that
two instances of "import sys" will give two references to the exact
same module object.

> That would be horrible. But here's an alternative which is less horrible and
> maybe even useful.
>
> There's still a single process-wide search path, but there's a second
> per-module search path which is searched first. By default it's empty.
>
> So a module can define it's own extra search path:
>
> __path__ = ['look/here', 'and/here']
> import something
>
> without affecting any other modules.

That's what Rick said first, and then said that if you're going to be
explicit, you should do the job properly and not have _any_ implicit
paths.

Thing is, though, it still breaks the sys.modules concept. Either
__path__ is ignored if the module was found in sys.modules, or it's
possible to have multiple entries with the same name (which would make
it hard to have a module replace itself in sys.modules, currently a
supported thing). Although I suppose all it'd require is that
sys.modules be keyed by __file__ rather than __name__, so they're
identified by fully qualified path and file name. (What does that do
in the face of .pyc files?)

>> And after all that, it would still fail if you happened to want to
>> import both "calendar" modules into the same module.
>
> __path__ = []
> import calendar
> __path__ = ['my/python/modules']
> import calendar as mycalendar

Frankly, if you actually want this, I think it's time to turn to an
uglier-but-more-flexible method.like poking around in importlib. (I'm
not sure off-hand how you'd go about it, it's not instantly obvious
from help(importlib).) I'm more concerned about the possibility of
your import succeeding or failing depending on the order of other
imports:

# foo.py
import calendar

# bar.py
__path__ = ['my/python/modules']
import foo
import calendar

How's that one to be resolved? That's what I don't like.

So long as sys.modules is (a) process-wide and (b) keyed by module
name rather than file name, sys.path MUST be process-wide too, and
MUST be set on startup, or as soon as possible afterwards. Any module
imported prior to altering sys.path will be fetched based on the
previous search path - and you have to import sys to change sys.path,
which means the minimum set of unalterable modules is, on Python 3.5:

rosuav at sikorsky:~$ cat showmods.py
import sys
print(", ".join(sorted(sys.modules.keys())))
rosuav at sikorsky:~$ python3 showmods.py
__main__, _codecs, _collections_abc, _frozen_importlib, _imp, _io,
_signal, _sitebuiltins, _stat, _sysconfigdata, _thread, _warnings,
_weakref, _weakrefset, abc, builtins, codecs, encodings,
encodings.aliases, encodings.latin_1, encodings.utf_8, errno,
genericpath, io, marshal, os, os.path, posix, posixpath, site, stat,
sys, sysconfig, zipimport

... that's a decent lot of modules you can't fiddle with. Hence
PYTHONPATH, which presumably is processed by the interpreter prior to
loading any modules.

ChrisA



More information about the Python-list mailing list