[SciPy-Dev] Breaking up scipy.stats or How to avoid importing the kitchen sink (when we are not in the kitchen)

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Feb 16 10:23:11 EST 2011


Warren's thread on scipy's subpackages made me realize that we can
break up the imports in scipy.stats in a backward compatible way.

Problem
"from scipy import stats" is slow unless scipy is already in the disk cache

len(beforenp), len(beforesp), len(beforestats), len(after)
125 261 341 569
>>> 569 - 341
228

e.g. who import scipy.sparse if there is no sparse code in scipy.stats
If I only want to use some tests, then all I need is scipy.stats.stats
and scipy.special

Proposal

keep scipy.stats as API import subpackage as public API especially for
interactive work

move all modules from scipy.stats into another directory, scipy.stats_
or scipy.statslib or something:
  keep it's __import__ empty
  create API one level down
  - stats_basic: current stats.stats plus tests from morestats,
(name?): imports only scipy.special
  - stats_other: rest of morestats and other extras (plots,
boxcox,...),  (name?)
  - mstats
  - kde
  - distributions: imports the kitchen sink
    no lazy imports possible because distributions are instances and
not just classes

then we can do
"from scipy.statlib import stats_basic"
and we get the ttests with an import of scipy.special plus one module
instead of plus 215 modules.


This is currently just an idea, and I won't pursue it further if we
don't want to go this way.


Notes

I don't understand some things about the imports,
why do I get some distutils and enthought modules with the stats
import. (I don't understand the lazy import machinery.)

statsmodels just switched to separating API from package imports.

import sys, copy
beforenp = copy.copy(sys.modules)
import numpy
beforesp = copy.copy(sys.modules)
import scipy
beforestats = copy.copy(sys.modules)
from scipy import stats
after = copy.copy(sys.modules)

print 'len(beforenp), len(beforesp), len(beforestats), len(after)'
print len(beforenp), len(beforesp), len(beforestats), len(after)

from pprint import pprint
pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
                   set(after)-set(beforestats)))))

##pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
##                   set(beforestats)-set(beforesp)))))



>python -i stats_imports.py
len(beforenp), len(beforesp), len(beforestats), len(after)
125 261 341 569
['_bisect',
 'bisect',
 'dis',
 'distutils',
 'distutils.dep_util',
 'distutils.distutils',
 'distutils.errors',
 'distutils.log',
 'distutils.os',
 'distutils.re',
 'distutils.spawn',
 'distutils.string',
 'distutils.sys',
 'distutils.util',
 'enthought',
 'enthought.envisage',
 'enthought.modulefinder',
 'enthought.plugins',
 'enthought.pyface',
 'enthought.traits',
 'inspect',
 'modulefinder',
 'mpl_toolkits',
 'new',
 'numpy.core',
 'numpy.dual',
 'opcode',
 'paste',
 'paste.modulefinder',
 'paste.pkg_resources',
 'pkg_resources',
 'pkgutil',
 'scikits',
 'scipy.integrate',
 'scipy.lib',
 'scipy.linalg',
 'scipy.misc',
 'scipy.optimize',
 'scipy.sparse',
 'scipy.special',
 'scipy.stats',
 'swig_runtime_data4',
 'token',
 'tokenize',
 'zlib']


Josef



More information about the SciPy-Dev mailing list