[Python-Dev] stats.py (was 'summing a bunch of numbers ')

Chad Netzer cnetzer@mail.arc.nasa.gov
20 Apr 2003 21:33:32 -0700


On Sun, 2003-04-20 at 20:28, David Ascher wrote:
> Tim Peters wrote:
> 
> >>There's a bunch of statistics functions (avg or mean, sdev etc.) that
> >>should go in a statistics package or module together with more
> >>advanced statistics stuff -- it would be a good idea to form a working
> >>group or SIG to design such a thing with an eye towards usability,
> >>power, and avoiding traps for newbies.

+1

> >Very big job, unless you leave the "advanced" stuff out.  Note that there
> >are many stats packages available for Python already, although some build on
> >NumPy.
> >
> Scipy's stats package is more complete than many people expect.

I was going to suggest that we consider adopting Gary Strangman's
stats.py package as the foundation for inclusion.  This is the package
that SciPy chose to include (with modifications of the namespace and API
to fit the SciPy scheme of things).

I've used it, and it is a very full featured package.  I was actually
kind of saddened that Gary had done all the work, since after getting my
Master's degree, I had considered implementing such a module myself (for
reasons of learning).  But Gary's work is quite comprehensive, and well
written, IMO (well tested, few external dependencies, etc.  I just drop
it in a working directory when I need it on a new system.)

http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python.html

Gary allowed SciPy to adopt his package under the BSD license, so I'm
sure he would be amenable to discussing any licensing issues that may
arise (the original package is GPL).  It works on Python lists, as well
as Numeric arrays.

I'd be happy to take up the efforts of approaching Gary about whether he
would consider "donating" his module for the standard lib, after any
changes a working group or SIG might suggest (or require).  Possibly
there are some namespace issues (actually, he has a companion "pstat"
module, that is a standard library module name conflict I'd wanted
fixed).

Other than ensuring it works on the normal python sequences, and
removing any dependencies on NumPy or Numeric (while hopefully allowing
it to integrate well with either), and possibly trying to reconcile name
issues with SciPy (if at all feasible), it may be definitely doable by
2.4.  I'm happy to volunteer some time to the effort.  I think it would
be quite worthwhile.

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)