[SciPy-dev] some statistical models / formulas

Thu May 4 11:37:30 EDT 2006

Hi,
John and I have wrote some code for statistical analysis biased
towards our area. I have included the general linear model part and an
example data set at:

https://netfiles.uiuc.edu/southey/www/glm.tar.gz

Basically, our features of our code includes:
1) Read in the data from a file using format codes to indicate if the
data columns are numeric or alphanumeric. This is really along the
lines of SAS's $ formatting but these are treated as 'class'
variables.

2) We also allow the data to have:
     a)  comment codes so to ignore selected lines of data
     b)  missing value codes such as * or NA

3) The data is stored in masked array (due to the missing values) and
any alphanumeric values are recoded to numeric as well as storing the
unique levels.

4) A summary method, similar to SAS's proc means that can be done for
a class variable (similar to the by statement in SAS).

5) A univariate general linear model similar to SAS's proc glm or R's
lm (?). In theory this should fit both class variables and covariates
as well as interactions and nested terms (although the interaction of
a class term and covariate may not be correct). The current outcome is
a ANOVA table similar to SAS's proc glm where there is the model fit
and Type 1 and 3 sums of squares parts for each term. There are still
some bugs in this one especially in terms of the Type 3 SS
calculations (which should be done differently).

6) Directly form X'X and X'Y from the data rather than matrix
multiplication based on the selected model.

Regards
Bruce Southey

On 4/19/06, Jonathan Taylor <jonathan.taylor at stanford.edu> wrote:
> i have made a numpy/scipy package for some linear statistical models
>
> http://www-stat.stanford.edu/~jtaylo/scipy_stats_models-0.01a.tar.gz
>
> i was hoping that it might someday get into scipy.stats, maybe as
> scipy.stats.models?
>
> anyways, i am sure the code needs work and more docs with examples, but
> right now there is basic functionality for the following (the tests give
> some examples):
>
> - model formulae as in R (to some extent)
> - OLS (ordinary least square regression)
> - WLS (weighted least square regression)
> - AR1 regression (non-diagonal covariance -- right now just AR1 but easy
> to extend to ARp)
> - generalized linear models (all of R's links and variance functions but
> extensible as well -- not everything has been rigorously tested but
> logistic agrees with R, for instance)
> - robust linear models using M estimators (with a number of standard
> default robust norms as in R's rlm)
> - robust scale estimates (MAD, Huber's proposal 2).
>
> it would be nice to add a few things over time, too, like:
>
> - mixed effects models
> - generalized additive models (gam), generalized estimating equations
> (gee)....
> - nonlinear regression (i have some quasi working code for this, too,
> but it is not yet included).
>
> + anything else people want to add.
>
>
>
> -- jonathan
>
> --
> ------------------------------------------------------------------------
> I'm part of the Team in Training: please support our efforts for the
> Leukemia and Lymphoma Society!
>
> http://www.active.com/donate/tntsvmb/tntsvmbJTaylor
>
> GO TEAM !!!
>
> ------------------------------------------------------------------------
> Jonathan Taylor                           Tel:   650.723.9230
> Dept. of Statistics                       Fax:   650.725.8977
> Sequoia Hall, 137                         www-stat.stanford.edu/~jtaylo
> 390 Serra Mall
> Stanford, CA 94305
>
>
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-dev
>
>
>
>