[SciPy-User] Generalized least square on large dataset

Peter Cimermančič peter.cimermancic at gmail.com
Fri Mar 9 12:43:45 EST 2012


>
>
>
> No, it does not. If you are working with counts, the appropriate model
> would usually be Poisson regression. I.e. Generalized linear model with
> log-link function and Possion probability family. I have seen many
> examples of microbiologists using linear regression when they should
> actually use Poisson regression (e.g. counting genes) or logistic
> regression (e.g. dose-response and titration curves).
>
> This will do it for you:
>
> MATLAB: glmfit from the statistics toolbox
> R: glm
> SAS: PROC GLIM
> Python: statmodels scikit
>
> Another example of inappropriate use of linear regression in
> microbiology is the Lineweaver-Burk plot as substitute for non-linear
> least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten
> curve. Some microbiologists are bevare of this, but they seem to prefer
> all sorts of ad hoc trickeries like linearizations and
> variance-stabilizing transforms instead of "just doing it right".
>
> As for samples that are not independent, that will affect the final
> likelihood. If you want to optimize the log-likelhood yourself, to
> control for this, getting ML estimates by maximizing the log-likelhood
> is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's
> method does not even need the gradient.) And if you need the "p-value",
> you can either use the likelihood ratio or Monte Carlo (e.g. permutation
> test).
>
>
Sturla, could you be more specific here? I don't know much about
(bio)statistics, but that doesn't mean I don't want to do the things right
:). All I want to get out of this analysis is to be able to say whether the
correlation between genome lengths and numbers of particular genes (which
looks neat and obvious from the scatter plot) is statistically significant
given that the data points are heavily phylogenetically biased. That's why
I mentioned "p-values". Of course, I'm open to any better/more accurate way
of getting there than initially planned.





>
> Sturla
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20120309/e417afbb/attachment.html>


More information about the SciPy-User mailing list