A smarter(?) package importer.

Prabhu Ramachandran prabhu at aero.iitm.ernet.in
Wed Nov 7 03:27:28 EST 2001


hi,

>>>>> "RD" == Rainer Deyke <root at rainerdeyke.com> writes:

    RD> Be careful you don't import the same module isn't loaded
    RD> twice, or bad performance and unreliable behavior will occur.
    RD> The correct behavior of 'import D' in module 'A.B.C' looks
    RD> something like this:

[snip]

    RD> Notes: - 'sys.modules' is checked at each step before trying
    RD> to load the module from a file.  This is absolutely essential
    RD> to prevent multiple copies of a module being loaded.  - Each
    RD> module is stored in 'sys.modules' by its true name.  Again,
    RD> this is essential.  

Thanks!  I think knee.py does this correctly.  It does not use None to
cache failiure.  There is only one place where imp is used to load the
module and before that is done it always checks with sys.modules.

def import_module(partname, fqname, parent):
    try:
        return sys.modules[fqname]
    except KeyError:
        pass
    try:
        fp, pathname, stuff = imp.find_module(partname,
                                              parent and parent.__path__)
    except (ImportError, AttributeError):
        # extension modules dont have a __path__ attribute.
        return None
    try:
        m = imp.load_module(fqname, fp, pathname, stuff)
    finally:
        if fp: fp.close()
    if parent:
        setattr(parent, partname, m)
    return m

Also, I'm assuming that your advice is general and not specifically
directed at the code I've put up.  

    RD> - A value of 'None' in 'sys.modules' is
    RD> used to cache failure.  This is not essential, but it matches
    RD> the current behavior of Python and it can give a significant
    RD> speed boost.

Does the performance boost occur mainly because each subsequent
failure would mean that sys.modules would have to scan through all of
its keys and then fail?  Hmmm, but if you kept caching failure, you'd
increase the number of keys.  If the package nesting is substantial
for every global module this would insert several new item into
sys.modules.

pkg/
   sub/
      subsub/
	foo.py

foo.py:
import string

This would insert all of the following into sys.modules:
subsub.string, sub.string, pkg.string, string

so if each and every package did this we'd have way too many 'string's
that point to None.  This means that len(sys.modules) increases quite
significantly.  Or can it be proved that caching is better than not
caching at all?

prabhu

-- 
Prabhu Ramachandran			  MayaVi Data Visualizer
http://www.aero.iitm.ernet.in/~prabhu     http://mayavi.sf.net




More information about the Python-list mailing list