[SciPy-Dev] SciPy Goal

Wed Jan 4 21:50:30 EST 2012

On Wed, Jan 4, 2012 at 9:22 PM, Fernando Perez <fperez.net at gmail.com> wrote:
> Hi all,
>
> On Wed, Jan 4, 2012 at 5:43 PM, Travis Oliphant <travis at continuum.io> wrote:
>> What do others think is missing?  Off the top of my head:   basic wavelets
>> (dwt primarily) and more complete interpolation strategies (I'd like to
>> finish the basic interpolation approaches I started a while ago).
>> Originally, I used GAMS as an "overview" of the kinds of things needed in
>> SciPy.   Are there other relevant taxonomies these days?
>
> Well, probably not something that fits these ideas for scipy
> one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
> from Berkeley' paper on parallel computing is not a bad starting
> point; summarized here they are:
>
>    Dense Linear Algebra
>    Sparse Linear Algebra [1]
>    Spectral Methods
>    N-Body Methods
>    Structured Grids
>    Unstructured Grids
>    MapReduce
>    Combinational Logic
>    Graph Traversal
>    Dynamic Programming
>    Backtrack and Branch-and-Bound
>    Graphical Models
>    Finite State Machines
>
> Descriptions of each can be found here:
> http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
> here:
>
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
>
> That list is biased towards the classes of codes used in
> supercomputing environments, and some of the topics are probably
> beyond the scope of scipy (say structured/unstructured grids, at least
> for now).
>
> But it can be a decent guiding outline to reason about what are the
> 'big areas' of scientific computing, so that scipy at least provides
> building blocks that would be useful in these directions.
>
> One area that hasn't been directly mentioned too much is the situation
> with statistical tools.  On the one hand, we have the phenomenal work
> of pandas, statsmodels and sklearn, which together are helping turn
> python into a great tool for statistical data analysis (understood in
> a broad sense).  But it would probably be valuable to have enough of a
> statistical base directly in numpy/scipy so that the 'out of the box'
> experience for statistical work is improved.  I know we have
> scipy.stats, but it seems like it needs some love.

(I didn't send something like the first part earlier, because I didn't
want to talk so much.)

Every new code and sub-package need additional topic specific maintainers.

Pauli, Warren and Ralf are doing a great job as default, general
maintainers, and especially Warren and Ralf have been pushing
bug-fixes and enhancements into stats (and I have been reviewing
almost all of it).

If there is a well defined set of enhancements that could go into
stats, then I wouldn't mind, but I don't see much reason in
duplicating code and maintenance work with statsmodels.

Of course there are large parts that statsmodels doesn't cover either,
and it is useful to extend the coverage of statistics in either
package.

However, adding code that is not low maintenance (because it's fully
tested) or doesn't have committed maintainers doesn't make much sense
in my opinion.

Cheers,

Josef

>
> Cheers,
>
> f
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev