[SciPy-Dev] Breaking up scipy.stats or How to avoid importing the kitchen sink (when we are not in the kitchen)
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed Feb 16 10:23:11 EST 2011
Warren's thread on scipy's subpackages made me realize that we can
break up the imports in scipy.stats in a backward compatible way.
Problem
"from scipy import stats" is slow unless scipy is already in the disk cache
len(beforenp), len(beforesp), len(beforestats), len(after)
125 261 341 569
>>> 569 - 341
228
e.g. who import scipy.sparse if there is no sparse code in scipy.stats
If I only want to use some tests, then all I need is scipy.stats.stats
and scipy.special
Proposal
keep scipy.stats as API import subpackage as public API especially for
interactive work
move all modules from scipy.stats into another directory, scipy.stats_
or scipy.statslib or something:
keep it's __import__ empty
create API one level down
- stats_basic: current stats.stats plus tests from morestats,
(name?): imports only scipy.special
- stats_other: rest of morestats and other extras (plots,
boxcox,...), (name?)
- mstats
- kde
- distributions: imports the kitchen sink
no lazy imports possible because distributions are instances and
not just classes
then we can do
"from scipy.statlib import stats_basic"
and we get the ttests with an import of scipy.special plus one module
instead of plus 215 modules.
This is currently just an idea, and I won't pursue it further if we
don't want to go this way.
Notes
I don't understand some things about the imports,
why do I get some distutils and enthought modules with the stats
import. (I don't understand the lazy import machinery.)
statsmodels just switched to separating API from package imports.
import sys, copy
beforenp = copy.copy(sys.modules)
import numpy
beforesp = copy.copy(sys.modules)
import scipy
beforestats = copy.copy(sys.modules)
from scipy import stats
after = copy.copy(sys.modules)
print 'len(beforenp), len(beforesp), len(beforestats), len(after)'
print len(beforenp), len(beforesp), len(beforestats), len(after)
from pprint import pprint
pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
set(after)-set(beforestats)))))
##pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
## set(beforestats)-set(beforesp)))))
>python -i stats_imports.py
len(beforenp), len(beforesp), len(beforestats), len(after)
125 261 341 569
['_bisect',
'bisect',
'dis',
'distutils',
'distutils.dep_util',
'distutils.distutils',
'distutils.errors',
'distutils.log',
'distutils.os',
'distutils.re',
'distutils.spawn',
'distutils.string',
'distutils.sys',
'distutils.util',
'enthought',
'enthought.envisage',
'enthought.modulefinder',
'enthought.plugins',
'enthought.pyface',
'enthought.traits',
'inspect',
'modulefinder',
'mpl_toolkits',
'new',
'numpy.core',
'numpy.dual',
'opcode',
'paste',
'paste.modulefinder',
'paste.pkg_resources',
'pkg_resources',
'pkgutil',
'scikits',
'scipy.integrate',
'scipy.lib',
'scipy.linalg',
'scipy.misc',
'scipy.optimize',
'scipy.sparse',
'scipy.special',
'scipy.stats',
'swig_runtime_data4',
'token',
'tokenize',
'zlib']
Josef
More information about the SciPy-Dev
mailing list