Trying to understand 'import' a bit better

Sun Mar 4 03:15:57 EST 2012

Hi all

I have been using 'import' for ages without particularly thinking about it - 
it just works.

Now I am having to think about it a bit harder, and I realise it is a bit 
more complicated than I had realised - not *that* complicated, but there are 
some subtleties.

I don't know the correct terminology, but I want to distinguish between the 
following two scenarios -

1. A python 'program', that is self contained, has some kind of startup, 
invokes certain functionality, and then closes.

2. A python 'library', that exposes functionality to other python programs, 
but relies on the other program to invoke its functionality.

The first scenario has the following characteristics -
  - it can consist of a single script or a number of modules
  - if the latter, the modules can all be in the same directory, or in one 
or more sub-directories
  - if they are in sub-directories, the sub-directory must contain 
__init__.py, and is referred to as a sub-package
  - the startup script will normally be in the top directory, and will be 
executed directly by the user

When python executes a script, it automatically places the directory 
containing the script into 'sys.path'. Therefore the script can import a 
top-level module using 'import <module>', and a sub-package module using 
'import <sub-package>.<module>'.

The second scenario has similar characteristics, except it will not have a 
startup script. In order for a python program to make use of the library, it 
has to import it. In order for python to find it, the directory containing 
it has to be in sys.path. In order for python to recognise the directory as 
a valid container, it has to contain __init__.py, and is referred to as a 
package.

To access a module of the package, the python program must use 'import 
<package>.<module>' (or 'from <package> import <module>'), and to access a 
sub-package module it must use 'import <package>.<sub-package>.<module>.

So far so uncontroversial (I hope).

The subtlety arises when the package wants to access its own modules. 
Instead of using 'import <module>' it must use 'import <package>.<module>'. 
This is because the directory containing the package is in sys.path, but the 
package itself is not. It is possible to insert the package directory name 
into sys.path as well, but as was pointed out recently, this is dangerous, 
because you can end up with the same module imported twice under different 
names, with potentially disastrous consequences.

Therefore, as I see it, if you are developing a project using scenario 1 
above, and then want to change it to scenario 2, you have to go through the 
entire project and change all import references by prepending the package 
name.

Have I got this right?

Frank Millman