[SciPy-dev] MCMC, Kalman Filtering, AI for SciPy?

Thu Sep 30 03:40:59 EDT 2004

Robert Kern wrote:

> Charles Harris wrote:
>
> [snip]
>
>> I agree that search and indexing are the best ways to find stuff, but 
>> I am mostly concerned as to where to commit stuff. Clustering, where 
>> does that go?
>
>
> scipy.cluster I would imagine.  ;-)
>
>> Lattice methods, where do they go? How about useful data structures 
>> or combinatorics? So on and so forth. I think the upper level GAMS 
>> categories cover sufficient range that most things can be put into a 
>> directory without embarrassment. As to the detailed breakdown in the 
>> GAMS sub-classifications, I am not so sure.
>
>
> To make the discussion a bit more concrete, here is an example 
> directory structure corresponding to the top-level GAMS 
> classifications. The names are all my own, so feel free to pretend 
> they are something more to your liking.
>
> <snip hierarchy>

> Now that I see it, it is somewhat appealing. I would probably want to 
> break up some of those into two or more top-level groups. I definitely 
> don't want to see too many subpackages under each of the top-level 
> groups ("Flat is better than nested.").
>
Here is where the current SciPy modules would likely get lumped in the 
GAMS hierarchy.

scipy/
  analysis/
  numbertheory/
  functions/           special
  linalg/              linalg, sparse
  interpolation/       interpolate
  rootfinding/
  optimization/        optimize, ga
  calculus/            integrate
  diffeq/             
  integraltransforms/  fftpack
  approximation/
  probstat/            stats
  simulation/
  datahandling/        io
  symbolic/
  geometry/
  graphics/            xplt, gplt, plt
  service/             gui_thread
  develop/
  other/               cow, cluster ??, signal  ??

(Cluster and signal didn't fit anywhere obvious to me)

The naming conventions are often quite similar.  The SciPy names are 
generally shorter which is nice for typing.  Where SciPy has multiple 
packages [(linalg, sparse), (optimize, ga), etc.], it is likely a good 
idea.  Like you, I don't want to see a deep nesting in the package 
structure. 

Looking at this, I don't see any real reason to reorganize top level 
package names.  Are any of them that bad or misleading?  On the other 
hand, I do think we should reorganize the functions within them some to 
fix the places where they are organized based on "build" convenience 
instead of actual function.  This will probably necessitate the addition 
of new top level groups and maybe the pruning of one of the current 
ones.  I've made a Wiki page to keep suggestions that people have:

    http://www.scipy.org/wikis/featurerequests/PackageReorganization

If you update the page, you might also post to python-dev so that people 
know to go check on the Wiki (that is so painful...).  We can obviously 
also just discuss it here and then transfer to the Wiki later. [side 
note: this using a wiki and a mailing list for communication is also a 
little painful].

> Fernando, could you give an example or two where you would want to 
> replicate a function across sub-packages? I'm wary of doing so as 
> there is already the enormous amount of replication with respect to, 
> at least, the base Numeric functions. Try scipy.special.<tab> in 
> IPython. I realize what you're proposing doesn't even come close to 
> that, but I'd like an example in any case.

I don't like the replication idea very well.  I think things should live 
in one place.  Otherwise people will wonder if two functions that are 
actually the same have different purposes, implementation, etc.

>
> And since we are talking about re-organization, is there anything we 
> can do about the problem I just mentioned? It wreaks havoc with not 
> only tab-completion but also automatic documentation generation [1]. 
> Is it practical to be careful about what we import into __init__.py? 
> By which I mean not doing "from foo import *" in __init__.py where 
> foo.py does "from scipy_base import *". On the other hand, explicitly 
> listing all of the names in special is gonna be a major pain and 
> fragile to boot.

I used to love "from xxx import *" and swore it was the right way to 
handle Numeric, etc. since I few "array" and friends as builtin 
functions...  I guess I've been hanging out with to many computer 
scientists lately.  Or perhaps it is the few times where I have wondered 
"where is function xxx [which is broken] coming from?" and struggled 
through a large codebase to track it down.  We've had a nasty bug or two 
where one import * unexpectedly clobbered some functions from a previous 
import *.  In any case, it is a (seldom broken) policy to never use 
import * in our code bases at Enthought.  It is probably a good idea to 
apply this same policy to SciPy.  Doing so would partially solve the 
problem you discuss.

The 2nd thought we have had on this is to put a "filter" tag within the 
doc-string of a module or package to specify a set of functions that 
tab-complete tools should ignore.  You could say "filter: sin, cos, ..." 
to remove a set of names or "filter: Numeric-all" (or something like 
that) to exclude all the functions in a module/package from the 
tab-complete list for a module.  IPython, PyCrust, Idle, and others 
could standardize on a format so that they all could benefit.

>
> [1] http://www.scipy.org/documentation/apidocs/scipy/scipy.special.html
>
eric