[SciPy-dev] Package organization

Thu Oct 13 04:06:54 EDT 2005

On Wed, 12 Oct 2005, Robert Kern wrote:

> I would like to see scipy's package organization become flatter and more
> oriented towards easy, lightweight, modular packaging rather than
> subject matter. For example, some people want bindings to SUNDIALS for
> ODEs. They could go into scipy.integrate, but that introduces a large,
> needless dependency for those who just want to compute integrals. So I
> would suggest that SUNDIALS bindings would be in their own
> scipy.sundials package.

I like this approach, though with scipy.lib.sundials. This is to separate 
plain wrappers (there can be different tools/libraries for the same task, e.g. 
there are various sparse matrix libraries) from general tools (tools for 
specific tasks, like linalg) that should live in scipy namespace.

> The wavelets library that Fernando is working on would go in as
> scipy.wavelets rather than being stuffed into scipy.signal. You get the
> idea.

I would also consider wavelets a more general tool than just a signal 
processing tool. Just like FFT is.

> This is also why I suggested making the scipy_core versions of fftpack
> and linalg be named scipy.corefft and scipy.corelinalg and not be
> aliased to scipy.fftpack and scipy.linalg. The "core" names reflect
> their packaging and their limited functionality. For one thing, this
> naming allows us to try to import the possibly optimized versions in the
> full scipy:
>
>  # scipy/corelinalg/__init__.py
>  import basic_lite
>  svd = basic_lite.singular_value_decomposition
>  ...
>
>  try:
>    from scipy import linalg
>    svd = linalg.svd
>    ...
>  except ImportError:
>    pass

OT, I'd like to consider the above codelet just as an example of desired 
behaviour, rather than a real code. I have had to much bad experience with
importing extension modules with failures that are hidden with such 
try-except blocks, and making debugging really painful. We should have a 
better method of detecting unavailability of a package, as the above would
have the same behaviour both for broken and missing packages.

> Okay, What Belongs In Scipy. It's somewhat difficult to answer the
> question, "Does this package belong in scipy?" without having a common
> answer to, "What is scipy?" I won't pretend to have the single answer to
> that last question, but I will start the dialogue based on the
> rationalizations I've come up with to defend my gut feelings.
>
> Things scipy is not:
>
>  * A framework. You shouldn't have to restructure your programs to use
> the algorithms implemented in scipy. Sometimes the algorithms themselves
> may require it (e.g. reverse communication solvers), but that's not
> imposed by scipy.
>
>  * Everything a scientist will need to do computing. For a variety of
> reasons, it's just not an achievable goal and, more importantly, it's
> not a good standard for making decisions. A lot of scientists need a
> good RDBMS, but there's no reason to put pysqlite into scipy. Enthon,
> package repositories, and specialized LiveCDs are better places to
> collect "everything."
>
>  * A plotting library. (Sorry, had to throw that in.)
>
> Things scipy is:
>
>  * A loose collection of slightly interdependent modules for numerical
> computing.
>
>  * A common build environment that handles much of the annoying work
> for numerical extension modules. Does your module rely on a library that
> needs LAPACK or BLAS? If you put it in scipy, your users can configure
> the location of their optimized libraries *once*, and all of the scipy
> modules they build can use that information.
>
>  * A good place to put numerical modules that don't otherwise have a
> good home.
>
> Things scipy *could* be:
>
>  * An *excellent* build environment for library-heavy extension
> modules. To realize this, we would need to integrate the configuration
> portion of PETSc's BuildSystem or something equivalent. The automatic
> discovery/download/build works quite well. If this were to be realized,
> some packages might make more sense as subpackages of scipy. For
> example, matplotlib and pytables don't have much reason to be part of
> scipy right now, but if the libraries they depend on could be
> automatically detected/downloaded/built and shared with other scipy
> subpackages, then I think it might make sense for them to live in scipy,
> too.

I think it should be a separate project. Some years ago I was working
on a package (taskman) that could download/build/install 
packages/libraries like gcc, ATLAS, Python, Numeric, etc to detect 
problems that might come up with certain combinations of software 
versions. So, I consider myself aware of most of the issues that one must 
deal with in such a project.

> As Pearu suggested, as we port scipy packages to the new scipy_core we
> should audit and label them. To that end:
>
>  * gui_thread, gplt, and plt are dead, I think.

+1

>  * xplt shambles along, but shouldn't return as scipy.xplt. It can
> never be *the* plotting library for scipy, and leaving it as scipy.xplt
> gives people that impression.

+0.5, I think scipy.xplt is a nice piece of work, where should it be 
moved?

>  * scipy.cluster is sorta broken and definitely incomplete. We should
> port over what's in Bio.Cluster. For that matter, there's quite a bit in
> biopython we should stea^Wport (Bio.GA, Bio.HMM, Bio.KDTree,
> Bio.NeuralNetwork, Bio.NaiveBayes, Bio.MaxEntropy, Bio.MarkovModel,
> Bio.LogisticRegression, Bio.Statistics.lowess).
>
>  * The bindings to ODEPACK, QUADPACK, FITPACK, and MINPACK are
> handwritten. Should we mark them as, "f2py when you get the chance"?
> Otherwise, they probably count as "state-of-the-art" although we could
> always expand our offerings like exposing some of the other functions in
> ODEPACK.

Yes. I am willing to take this task.

>  * scipy.optimize: I think I recently ran into a regression in the old
> scipy. fmin() wasn't finding the minimum of the Rosenbrock function in
> the tutorial. I'll have to check that again. The simulated annealing
> code could use some review.
>
>  * scipy.special: cephes.round() seems to be buggy depending on the
> platform, and I think we got a bug report about one of the other functions.
>
>  * I will maintain much of scipy.stats. Of course, that will probably
> mean, "throwing anova() into the sandbox never to return." Many of the
> other functions in stats.py need vetting.
>
> Now I'm sure I've used up my opinion budget.

Could you give me your account number of your opinion pank, I'd like to 
make some opinion transfer from my credit account;-)

I can maintain/review various f2py based wrappers (blas,lapack,etc, 
anything that should go into scipy.lib) as well fftpack and linalg 
packages. I am also interested in integrate, optimize, interpolate 
packages but mainly as a code contributor through scipy.lib.

Another task that I have taken within distutils framework, is to make 
scipy Fortran compiler independent. It means that on the absence of a 
Fortran compiler f2c based C libraries will be used when building scipy.

Pearu