[SciPy-Dev] Breaking up scipy.stats or How to avoid importing the kitchen sink (when we are not in the kitchen)
Bruce Southey
bsouthey at gmail.com
Wed Feb 16 12:09:37 EST 2011
On 02/16/2011 09:23 AM, josef.pktd at gmail.com wrote:
> Warren's thread on scipy's subpackages made me realize that we can
> break up the imports in scipy.stats in a backward compatible way.
>
> Problem
> "from scipy import stats" is slow unless scipy is already in the disk cache
>
> len(beforenp), len(beforesp), len(beforestats), len(after)
> 125 261 341 569
>>>> 569 - 341
> 228
>
> e.g. who import scipy.sparse if there is no sparse code in scipy.stats
> If I only want to use some tests, then all I need is scipy.stats.stats
> and scipy.special
>
> Proposal
>
> keep scipy.stats as API import subpackage as public API especially for
> interactive work
>
> move all modules from scipy.stats into another directory, scipy.stats_
> or scipy.statslib or something:
> keep it's __import__ empty
> create API one level down
> - stats_basic: current stats.stats plus tests from morestats,
> (name?): imports only scipy.special
> - stats_other: rest of morestats and other extras (plots,
> boxcox,...), (name?)
> - mstats
> - kde
> - distributions: imports the kitchen sink
> no lazy imports possible because distributions are instances and
> not just classes
>
> then we can do
> "from scipy.statlib import stats_basic"
> and we get the ttests with an import of scipy.special plus one module
> instead of plus 215 modules.
>
>
> This is currently just an idea, and I won't pursue it further if we
> don't want to go this way.
Ignoring backwards compatibility, we can do something about the current
__init__.py:
"
from info import __doc__
from stats import *
from distributions import *
from rv import *
from morestats import *
from kde import gaussian_kde
import mstats
"
Most items appear to come from stats (27) and distributions (70).
So, without addressing the impacts, three 'easy' things that could be
done are:
1) avoiding or changing the distribution import would help.
2) use 'import morestats' instead of 'from morestats import *'.
3) move less common functions in stats.py to morestats.py and just do
'import morestats'. A possible list for things to move from stats.py are:
MOMENTS HANDLING NAN: nanmean
nanmedian
nanstd
ALTERED VERSIONS: tmean
tvar
tstd
tsem
describe
TRIMMING FCNS: threshold (for arrays only)
trimboth
trim1
around (round all vals to 'n' decimals)
Bruce
>
> Notes
>
> I don't understand some things about the imports,
> why do I get some distutils and enthought modules with the stats
> import. (I don't understand the lazy import machinery.)
>
> statsmodels just switched to separating API from package imports.
>
> import sys, copy
> beforenp = copy.copy(sys.modules)
> import numpy
> beforesp = copy.copy(sys.modules)
> import scipy
> beforestats = copy.copy(sys.modules)
> from scipy import stats
> after = copy.copy(sys.modules)
>
> print 'len(beforenp), len(beforesp), len(beforestats), len(after)'
> print len(beforenp), len(beforesp), len(beforestats), len(after)
>
> from pprint import pprint
> pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
> set(after)-set(beforestats)))))
>
> ##pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
> ## set(beforestats)-set(beforesp)))))
>
>
>
>> python -i stats_imports.py
> len(beforenp), len(beforesp), len(beforestats), len(after)
> 125 261 341 569
> ['_bisect',
> 'bisect',
> 'dis',
> 'distutils',
> 'distutils.dep_util',
> 'distutils.distutils',
> 'distutils.errors',
> 'distutils.log',
> 'distutils.os',
> 'distutils.re',
> 'distutils.spawn',
> 'distutils.string',
> 'distutils.sys',
> 'distutils.util',
> 'enthought',
> 'enthought.envisage',
> 'enthought.modulefinder',
> 'enthought.plugins',
> 'enthought.pyface',
> 'enthought.traits',
> 'inspect',
> 'modulefinder',
> 'mpl_toolkits',
> 'new',
> 'numpy.core',
> 'numpy.dual',
> 'opcode',
> 'paste',
> 'paste.modulefinder',
> 'paste.pkg_resources',
> 'pkg_resources',
> 'pkgutil',
> 'scikits',
> 'scipy.integrate',
> 'scipy.lib',
> 'scipy.linalg',
> 'scipy.misc',
> 'scipy.optimize',
> 'scipy.sparse',
> 'scipy.special',
> 'scipy.stats',
> 'swig_runtime_data4',
> 'token',
> 'tokenize',
> 'zlib']
>
>
> Josef
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
More information about the SciPy-Dev
mailing list