[SciPy-User] calculating numerous linear regressions quickly

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 13 15:06:24 EST 2014


On Mon, Jan 13, 2014 at 2:49 PM, Bryan Woods <bwoods at aer.com> wrote:
> Given some geospatial grid with a time dimension V[t, lat, lon], I want to
> compute the trend at each spatial point in the domain. Essentially I am
> trying to compute many linear regressions in the form:
> y = mx+b
> where y is the predicted value of V, x is the time coordinate array. The
> coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x,
> or t) will be the same for each regression. I want to gather the regression
> coefficients (m,b), correlation, and p-value for the temporal trend at each
> spatial point. This can be directly accomplished by repeatedly calling
> stats.linregress inside of a loop for every [lat, lon] point in the domain,
> but it is not efficient.
>
> The challenge is that I need to compute a lot of them quickly and a python
> loop is proving very slow. I feel like there should be some version of
> stats.linregress that accepts and returns multidimensional without being
> forced into using a python loop. Suggestions?

That can be done completely without loops.

reshape the grid to 2d (t, nlat*nlong)  -> Y
trend = np.vander(t, 2)
(m,b) = np.linalg.pinv(trend).dot(Y)

and then a few more array operations to get the other statistics.

I can try to do it later if needed.

Josef

>
> Thanks,
> Bryan
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list