[SciPy-Dev] Breaking up scipy.stats or How to avoid importing the kitchen sink (when we are not in the kitchen)

Bruce Southey bsouthey at gmail.com
Wed Feb 16 12:09:37 EST 2011


On 02/16/2011 09:23 AM, josef.pktd at gmail.com wrote:
> Warren's thread on scipy's subpackages made me realize that we can
> break up the imports in scipy.stats in a backward compatible way.
>
> Problem
> "from scipy import stats" is slow unless scipy is already in the disk cache
>
> len(beforenp), len(beforesp), len(beforestats), len(after)
> 125 261 341 569
>>>> 569 - 341
> 228
>
> e.g. who import scipy.sparse if there is no sparse code in scipy.stats
> If I only want to use some tests, then all I need is scipy.stats.stats
> and scipy.special
>
> Proposal
>
> keep scipy.stats as API import subpackage as public API especially for
> interactive work
>
> move all modules from scipy.stats into another directory, scipy.stats_
> or scipy.statslib or something:
>    keep it's __import__ empty
>    create API one level down
>    - stats_basic: current stats.stats plus tests from morestats,
> (name?): imports only scipy.special
>    - stats_other: rest of morestats and other extras (plots,
> boxcox,...),  (name?)
>    - mstats
>    - kde
>    - distributions: imports the kitchen sink
>      no lazy imports possible because distributions are instances and
> not just classes
>
> then we can do
> "from scipy.statlib import stats_basic"
> and we get the ttests with an import of scipy.special plus one module
> instead of plus 215 modules.
>
>
> This is currently just an idea, and I won't pursue it further if we
> don't want to go this way.
Ignoring backwards compatibility, we can do something about the current 
__init__.py:
"
from info import __doc__

from stats import *
from distributions import *
from rv import *
from morestats import *
from kde import gaussian_kde
import mstats
"

Most items appear to come from stats (27) and distributions (70).

So, without addressing the impacts, three 'easy' things that could be 
done are:
1) avoiding or changing the distribution import would help.

2) use 'import morestats' instead of 'from morestats import *'.

3) move less common functions in stats.py to morestats.py and just do 
'import morestats'. A possible list for things to move from stats.py are:
MOMENTS HANDLING NAN: nanmean
                       nanmedian
                       nanstd
ALTERED VERSIONS:  tmean
                    tvar
                    tstd
                    tsem
                    describe
TRIMMING FCNS:  threshold (for arrays only)
                 trimboth
                 trim1
                 around (round all vals to 'n' decimals)


Bruce
>
> Notes
>
> I don't understand some things about the imports,
> why do I get some distutils and enthought modules with the stats
> import. (I don't understand the lazy import machinery.)
>
> statsmodels just switched to separating API from package imports.
>
> import sys, copy
> beforenp = copy.copy(sys.modules)
> import numpy
> beforesp = copy.copy(sys.modules)
> import scipy
> beforestats = copy.copy(sys.modules)
> from scipy import stats
> after = copy.copy(sys.modules)
>
> print 'len(beforenp), len(beforesp), len(beforestats), len(after)'
> print len(beforenp), len(beforesp), len(beforestats), len(after)
>
> from pprint import pprint
> pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
>                     set(after)-set(beforestats)))))
>
> ##pprint(sorted(set(('.'.join(i.split('.')[:2]) for i in
> ##                   set(beforestats)-set(beforesp)))))
>
>
>
>> python -i stats_imports.py
> len(beforenp), len(beforesp), len(beforestats), len(after)
> 125 261 341 569
> ['_bisect',
>   'bisect',
>   'dis',
>   'distutils',
>   'distutils.dep_util',
>   'distutils.distutils',
>   'distutils.errors',
>   'distutils.log',
>   'distutils.os',
>   'distutils.re',
>   'distutils.spawn',
>   'distutils.string',
>   'distutils.sys',
>   'distutils.util',
>   'enthought',
>   'enthought.envisage',
>   'enthought.modulefinder',
>   'enthought.plugins',
>   'enthought.pyface',
>   'enthought.traits',
>   'inspect',
>   'modulefinder',
>   'mpl_toolkits',
>   'new',
>   'numpy.core',
>   'numpy.dual',
>   'opcode',
>   'paste',
>   'paste.modulefinder',
>   'paste.pkg_resources',
>   'pkg_resources',
>   'pkgutil',
>   'scikits',
>   'scipy.integrate',
>   'scipy.lib',
>   'scipy.linalg',
>   'scipy.misc',
>   'scipy.optimize',
>   'scipy.sparse',
>   'scipy.special',
>   'scipy.stats',
>   'swig_runtime_data4',
>   'token',
>   'tokenize',
>   'zlib']
>
>
> Josef
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev




More information about the SciPy-Dev mailing list