[SciPy-Dev] Deprecate stats.glm?

Bruce Southey bsouthey at gmail.com
Thu Jun 3 15:55:07 EDT 2010


On 06/03/2010 11:16 AM, Nathaniel Smith wrote:
> On Thu, Jun 3, 2010 at 8:53 AM,<josef.pktd at gmail.com>  wrote:
>    
>> GLM as in general linear model not generalized. (It's the worst
>> conflicting acronym in stats).
>>      
> Sure, and lets not even talk about generalized least squares
> (unrelated to both!).
>
> But the general linear model is basically identical to a simple linear
> model, both in interface and implementation.
Depends what you mean by 'simple'. Stealing from the SAS manual, these 
are some of the models fitted by the GLM procedure which I would not 
call simple:
simple regression
multiple regression
analysis of variance (ANOVA), especially for unbalanced data
analysis of covariance
response surface models
weighted regression
polynomial regression
partial correlation
multivariate analysis of variance (MANOVA)
repeated measures analysis of variance

These include interactions...
>   There's no reason to have
> a separate function for it, one should just accept a matrix for the
> "y" variable in the OLS code. But *generalized* linear models are
> different in interface, implementation, and are almost as much of a
> stats workhorse as standard linear models. So every book I've ever
> seen uses the abbreviation "glm" to refer to the generalized version.
> (Also, this is what R calls the function ;-).)
>    
Yeah, it is interesting that you forget older statistical packages (SAS, 
SPSS, don't remember what Genstat did ) and the first GLIM (the first? 
generalized linear model package).

> The implementation of dummy coding is kind of useful, but this is the
> wrong place and the wrong name...
>    
Why?
That is exactly what is needed and what stats.glm does.

> (Also, its least squares implementation calls inv -- the textbook
> example of bad numerics!)
>    
Actually it should call pinv() here but you going to have to prove that 
this is 'bad numerics'! Especially given how the numpy computes it and 
that design matrices tend to have poor numerics to start with 
(especially if you do anova and use condition number to assess 
numerics). [I strong dislike people complaining of the apparent bad 
numerics just because they see the word inverse.]

> ...Okay, you know all that anyway, the question is what to do with it.
> If the problem were just that it needed a better implementation and
> some new features added, then maybe we would keep it and let it be
> improved incrementally. But the interface is just wrong, so we'll be
> removing it sooner or later, and it might as well be sooner, rather
> than prolong the agony.
>
> -- Nathaniel
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>    
The simple reason is that there is no alternative for users to use yet 
such as pystatsmodels.


Bruce



More information about the SciPy-Dev mailing list