[SciPy-dev] GSoC Project Proposal: Datasource and Jonathan Taylor's statistical models

Skipper Seabold jsseabold at gmail.com
Fri Mar 27 18:09:20 EDT 2009


Bruce Southey wrote:
> Not getting into the merits of either part, I think you are asking for
> trouble doing both because there is not clear connection between the two
> parts. Knowing one part is not going to help you with the other. (The
> argument that it helps get 'your feet wet' is rather lame.)

Your point is well taken.  I think I will focus on the second part, as
there seems to be much more interest in the statistical functionality.
 And my work would undoubtedly be better if focused.

>I would strongly suggest that the main emphasis is just to get
>Jonathan's code integrated into Scipy and perhaps something from various
>places like the Scikit learn (how many logistic regression or least
>squares codes do we really need?) and EconPy
>http://code.google.com/p/econpy/wiki/EconPy

I will have a closer look through Scikit learn and econpy and revise.

>I would think that it is essential to get these to work with masked
>arrays (allows missing observations) or record array (enables the use of
>'variable' names in model statements like most statistics packages do).

I agree.  There has been some discussion of the most appropriate way
to handle this in your thread previously mentioned (eg., it would not
always be appropriate to force conversion to a masked array, should
stats and mstats be merged, etc.), and I would appreciate any
direction that could be offered.  I like the idea of the "usemask"
flag here http://mail.scipy.org/pipermail/scipy-dev/2009-February/011414.html
but obviously would defer to others for the best solution.  Should I
be spending most of my time looking through mstats rather than stats?

>I would like to see the inclusion of Statistical Reference Datasets Project:
>http://www.itl.nist.gov/div898/strd/
>
>The datasets would allow us to validate the accuracy of the code.

Very good idea.

Thanks for some initial feedback.  I will take under advisement and
revise my proposal as needed.

Best,
Skipper



More information about the SciPy-Dev mailing list