PyWart: "Python's import statement and the history of external dependencies"

Rick Johnson rantingrickjohnson at gmail.com
Fri Nov 21 17:25:35 EST 2014


On Friday, November 21, 2014 1:24:53 PM UTC-6, Ian wrote:
> On Fri, Nov 21, 2014 at 11:24 AM, Rick Johnson
> > Are you also going to call drivers "fools" because they bought
> > a "certain brand" of car only to have the airbag explode in
> > their face?
> 
> No, but I'll call them fools if they buy a car and the engine catches
> fire because they never bothered to change the oil.

As Dennis pointed out that's highly unlikely.

> If you don't want to have module name collisions, then don't create
> modules with names that are likely to collide when Python gives you an
> excellent tool for avoiding collisions (namespaces). Don't go blaming
> Python for "bad design" when you couldn't even be bothered to use the
> tools made available to you.

And by "namespaces" you must be talking about packages here?
Okay, hold that thought...

> >> Now you can drop as much stuff in there as you like, and
> >> none of it will ever conflict with the standard library
> >> (unless a standard "ricklib" module is added, which is
> >> unlikely).
> >
> > Yes, and now we've solved one problem by replacing it with
> > it's inverse -- try importing the *python lib* calendar
> > module and all you will get is your local "intra-package"
> > version. Now, the only way to get to the lib module is by
> > mutilating sys.path, or using an import utility module to
> > "import by filepath".
> 
> Um, no. If your calendar module is named ricklib.calendar, then
> importing just calendar will import the standard library calendar.
> 
> The only exception is if you're doing "import calendar" from inside
> the ricklib package, and you're using Python 2, and you don't have
> "from __future__ import absolute_import" at the top of your module.
> The solution to this is easy: just add that __future__ import to the
> top of your module, and poof, implicit relative imports don't happen.
> This is also fixed entirely in Python 3.

I wish the irony of needing to know an "implicit rule" to
solve an "implicit problem" could be funny, but it's really
just sad. I can't help but be reminded of the Python zen.
"If it's difficult to explain it's probably a bad idea".
What's more difficult to explain than an implicit rule you
have no knowledge of?

 NOW THERE'S SOME IRONY FOR YOU!

> > Anyone would expect that when *DIRECTLY* importing a
> > package, if the __init__ file has code, then THAT code
> > should be executed, HOWEVER,  not many would expect that
> > merely "referencing" the package name (in order to import a
> > more deeply nested package) would cause ALL the
> > intermediate __init__ files to execute -- this is madness,
> > and it prevents using an __init__ file as an "import hub"
> > (without side-effects)!
> 
> The whole point of the __init__.py file, in case you didn't intuit it
> from the name, is to host any initialization code for the package. Why
> on earth would you expect to import a module from a package without
> initializing the package?

Because you failed to notice that i was NOT importing the
"package" which contained the __init__file, no, i was
importing a *SUB PACKAGE* of the package that contained the
__init__ file.

Why does the code in the main package need to run when i
*explicitly* and *directly* fetched a "nested resource"
within the package? Nothing within the __init__ file is
affecting the code within the subpackage i wanted, and
inversely, the package i wanted does not depend on anything
within the __init__ file. There exists no symbiotic
relationship between these two machinery, yet, by
referencing one of them, the other starts doing unnecessary
things!

There is a workaround, but it's even more of a mess. In
order to maintain a unique "import hub" without the chance
of side effects from __init__ files, i can move all the code
in "ricklib.subpkg1.__init__.py" (the code that does all the
imports) into a normal file named
ricklib.subpkg1._subpkg1.py". 

 + ricklib 
   __init__.py 
   + subpkg1 (ricklib.subpkg1) 
      __init__.py 
      _subpkg1.py
      module1.py 
      module2.py 
      module3.py 
      + subpkg1a (ricklib.subpkg1.subpkg1a) 
      
Now, since the __init__ file has no global code, i can
import "ricklib.subpkg1.subpkg1s" without side effect -- but
of course, at a cost! 

Advanced Python packages are a zero sum game. You cannot
remove the problems, all you can do is move them to new
locations.

  so instead of the former (with side effects):
    
    from ricklib.subpkg1.subpkg1a import something
    
  I can do this (without side effect):

    from ricklib.subpkg1.subpkg1a._subpkg1a import something
    
  But at the cost of my sanity.
    

> > Because the alternative is messy. If i have a collection of
> > modules under a package, sometimes i would like to import
> > all the "exportable objects" into the __init__ file and use
> > the package as an "import hub".
> 
> What is the point of putting things into a hierarchical namespace in
> the first place if you're just going to turn around and subvert it
> like this?

I'm not "subverting it", i'm merely trying to organize my
*VAST* code libraries utilizing the ONLY tools that have been
made available to me -- and to be quite honest, these tools are
lacking!

If Python expects me to use packages to protect my module
names from clashing, but it gives me no method by which to
import packages without causing side effects, then what is a
boy to do (besides creating workarounds for workarounds)? 

> > But the current "global import search path" injections are
> > just the inverse. You make changes to sys.path in one
> > module, and if you fail to reset the changes before
> > execution moves to the next module in the "import chain",
> > then that module's import search path will be affected in
> > implicit ways that could result in importing the wrong
> > module.
> 
> No, because the trick you describe doesn't even work. If you edit
> sys.path in one file in order to import the coconut module:
> 
> sys.path.insert(0, '/path/to/island')
> import coconut
> 
> And then in another module change the sys.path file and try to import
> a different coconut module:
> 
> sys.path[0] = '/path/to/other/island')
> import coconut
> 
> You think the second import will produce the second coconut module? It
> won't, because the sys.modules cache will already contain an entry for
> 'coconut' that points to the first module imported. In order to make
> this work, you would have to not only modify sys.path but also clear
> the sys.modules cache.

DAMMIT! 

We have levels upon levels of esoteric, un-intuit-able, and
implicit procedures happining here. It seems that sys.path
modification is fruitless. Which is good, because i never
liked the idea of it anyway. It's a dirty hack!

Thanks for pointing this out in such a simplistic manner! I
now understand why the import mechinism is so unintuitable!
I'm gushing with exuberance because i feel as though a
"coconut of enlightenment" has been dropped on my head!

    I NOW KNOW HOW TO FIX THE PYTHON IMPORT PROBLEM!

I was aware that sys.path was global, but for some reason, i
failed to realize that sys.modules was (i don't know why, it
seems to make perfect sense now!) I was thinking that
sys.modules was specific to each script, and as such, even
though modules were being cached by name, the changes to
sys.path would circumvent the "short circuit behavior" of
"module name resolution"; implemented by a "peek" into
system modules.
    
Not only is Python going to stop searching the "file system"
for module named "coconut" when it first encounters a file
named coconut, it won't even check sys.path to see if the
programmer has injected a new directory that might contain a
*DIFFERENT* coconut file!

    '/path/to/other/island'

And that's good for performance, but bad for intuition!
Now, i'm not suggestion that Python burn up our hard-drives
with listdir() requests, however, there must be some sort of
method we can implement that will solve the "coconut import
problem".

============================================================
 THE SOLUTION TO PYTHON'S IMPLICIT IMPORT NIGHTMARE
============================================================

The only way to wrangle this beast in a manner that offer
both the ease of implicit machinery, and the power of explicit
machinery, and do both in a backwards compatible manner, is
to introduce the following:

############################################################
#                         STEP 1:                          #
############################################################
# Make sys.path *read-only*                                #
############################################################

Actually "step 1" is optional. We could leave it as is but
making it read-only would be more consistent. We need to
drive home the point that "injecting sys.path is fruitless"

############################################################
#                         STEP 2:                          #
############################################################
# Add a new listing called "sys.path_extra" (or whatever   #
# makes sense; that's a dumb name but the best i can think #
# of for now), OR (even better), allow a new variable into #
# each module namespace called "__search_path__" (or       #
# something. Again, the name is negotiable), for which     #
# will be a list of paths.                                 #
############################################################

############################################################
#                          STEP 3                          #
############################################################
# Make the following changes to the import machinery:      #
# Before Python reads a module file, Python will clear the #
# values in "sys.path_extra", OR, query the                #
# "__search_paths__" variable, if any paths exists in this #
# list, THEN THESE PATHS MUST BE SEARCHED, AND THEY MUST   #
# BE SEARCHED BEFORE ANY PATHS IN "sys.path", AND NO       #
# PEEKING IN "sys.modules" IS ALLOWED!                     #
############################################################

By this method, we can retain the existing implementation of
import, whist offering a more powerful tool to control the
machinery when needed.




More information about the Python-list mailing list