[SciPy-dev] State of stats modules?

Gary Strangman strang at nmr.mgh.harvard.edu
Mon Nov 5 13:15:09 EST 2001


> Most of the stats module was written by Gary Strangman
> (http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/).  As far as I
> know, it's the most full featured stats module around, and has very good
> doc-strings.  Gary developed it for his own research, and, as such, it is
> somewhat specialized to his field.  Still, it is very usable, and, at least
> for the functions I have used, reliable.

Specialized it is, and I have variable confidence in the various functions
(some are much more used--read: better tested--than others).

> Gary's work was/is an excellent starting point for SciPy's statistics
> capabilities.  Most of the work needed is actually trimming out extra
> functionality not needed or duplicated, adding unit test functions, and
> assuring that functions behave similarly in calling convention to other
> Numeric/SciPy functions.  The new_stats.py module is the beginnings of this
> effort, but it hasn't had any attention in a while.  There are also the
> beginnings of some unit testing in the stats/tests directory.  Hopefully a
> full compliment of unit tests will develop so there are fewer questions
> about result vailidity.

This would be outstanding ... particularly the unit testing. I've done
some, but way too little.

> aanova and collapse:
> 
> I haven't used these, and don't know much about them.  I'll forward this to
> Gary and see if he has any comments.

aanova() was a simple analysis of variance function, commonly
used in behavioral-type research but broadly applicable. It was written
when I was learning about anovas in grad school, and hence is poorly
written, poorly tested, and non-optimized. (It worked for the stuff I
needed, when I needed it, but my I have pulled the function from more
recent versions of my module out of my own concerns about its adequacy and
hence utility.)

collapse() is a generic function to collapse over rows of a data file. It
finds unique combinations of values in the columns specified by keepcols
and for each such unique combination it calculates a collapse-function
(mean, sterr, user-defined) for each column specified in collapsecols. 

> The stats module deserves some attention, but isn't receiving any right now.
> Any takers?

More recent versions of pstat.py and stats.py (at least more recent than
the def's that were quoted) can be found on my web site

http://www.nmr.mgh.harvard.edu/nsg/strang/python.html

but sadly those are modified only very slowly and irregularly at best.
"Takers" are welcome. :-)

Gary

--------------------------------------------------------------
Gary Strangman, PhD        |  Neural Systems Group
Office: 617-724-0662       |  Massachusetts General Hospital
Fax:    617-726-4078       |  13th Street, Bldg 149, Room 9103
strang at nmr.mgh.harvard.edu |  Charlestown, MA  02129
http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/




More information about the SciPy-Dev mailing list