[SciPy-dev] Package organization

Thu Oct 13 02:33:53 EDT 2005

Robert Kern wrote:

> I would like to see scipy's package organization become flatter and more
> oriented towards easy, lightweight, modular packaging rather than
> subject matter. For example, some people want bindings to SUNDIALS for
> ODEs. They could go into scipy.integrate, but that introduces a large,
> needless dependency for those who just want to compute integrals. So I
> would suggest that SUNDIALS bindings would be in their own
> scipy.sundials package.
> 
> As another example, I might also suggest moving the simulated annealing
> module out into scipy.globalopt along with diffev.py and pso.py that are
> currently in my sandbox. They're all optimizers, but functionally they
> are unrelated to the extension-heavy optimizers that make up the
> remainder of scipy.optimize.
> 
> The wavelets library that Fernando is working on would go in as
> scipy.wavelets rather than being stuffed into scipy.signal. You get the
> idea.
> 
> This is also why I suggested making the scipy_core versions of fftpack
> and linalg be named scipy.corefft and scipy.corelinalg and not be
> aliased to scipy.fftpack and scipy.linalg. The "core" names reflect
> their packaging and their limited functionality. For one thing, this
> naming allows us to try to import the possibly optimized versions in the
> full scipy:

[snip excellent analysis of scipy's organizational status ]

Let me add a minor twist to your plan, which perhaps may help a little.  How 
about making a two-level distinction between 'scipy, the core package' and 
'scipy, the collection of tools'?  Here's how it could be organized, in terms 
of namespaces and release policy: whatever is defined as the core is released 
by the scipy package proper, and can be safely considered a dependency for the 
rest.  Note that this can still be split between scipy_core and 'full scipy', 
where scipy_core is Travis' new Numeric/numarray and 'full scipy' contains 
much more functionality.

But as far as packages written by third-party authors, which can live under 
the scipy namespace as an umbrella, benefit from scipy's build facilities and 
core libraries, how about putting them all into a 'toolkits' namespace?  The 
actual name, for typing convenience, could be scipy.kits or scipy.tools 
(something short).

This would then give us the following structure:

1.  Scipy_core: the new Numeric/numarray package, which includes basic FFT, 
linear algebra, random numbers and perhaps basic i/o (at least save/load 
abilities), and whatever else I'm missing right now (I don't have it yet 
installed on this laptop).

2.  Scipy 'full': depends on (1), and exposes all the other scipy names: 
scipy.{linalg,optimize,integrate,...}.  These are libraries considered 
officially part of scipy, so that even if they are maintained by others (much 
like python's stdlib), there is a committment to a common release cycle. 
These can, if need be, have inter-dependencies, as they will always be 
released as a whole.

1 and 2 all use the top-level scipy namespace.  Then we have:

3.  The scipy.{kits|tools} namespace (or whatever the chosen name).  This is 
where third parties can drop their own packages, which can depend either only 
(1) or on the full (2) system (their level of dependency should be explicitly 
stated).

The kits namespace may ship empty by default, or it could be populated with a 
few things from current scipy if it is decided they are best moved there.

The only thing required for projects to live in the .kits namespace is really 
to avoid top-level name collisions, so it would perhaps be worth having an 
informal policy of people checking with scipy-dev for a name before using it.

This layout would allow the core team to work with relative freedom at the 
top-level namespace, without worrying about toolkits taking names they may 
need in the future.  Similarly, toolkit authors will have a well-defined API 
to build upon.

The criterion for deciding what goes in (2) should be one of generality: tools 
likely to be of very wide need for most things in scientific work, and which 
provide a foundation for toolkit authors.

If this is combined with a CPAN-like system (eggs, PyPi, whatever), it should 
be very easy for users, once they have the basic layers in place, to grab a 
toolkit by issuing a single command or going to a website.  I'd suggest, if 
this were adopted, keeping a simple page at scipy with brief descriptions for 
each toolkit, even if they are developed/distributed externally.

The current 'example package' (the ex-xxx package) could be the prototype for 
a toolkit, used by new toolkit authors to get off the ground quickly, and by 
scipy to establish coding and documentation policy for .kits members.

If we establish a few conventions to be followed by toolkits, we can ensure 
that the top-level documentation/info facilities automatically register them 
(dynamically).

Anyway, I've certainly far exceeded my opinion budget on this one, so I should 
shut up now :)

Cheers,

f