[SciPy-user] predicting values based on (linear) models

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Jan 13 20:24:21 EST 2009


On Tue, Jan 13, 2009 at 3:33 PM, Tim Michelsen
<timmichelsen at gmx-topmail.de> wrote:
> Hello,
> I had to do several statistical computations lately. I therefore looked
> at the statistical language R since it seems to contain already many
> models and functionality.
>
> Is there some function like "predict" [1] in Python?
>
> Example:
>
>      x <- rnorm(15)
>      y <- x + rnorm(15)
>      t = predict(lm(y ~ x))
>
>      t => the predicted data determined by the linear model (comapare
> to scipy.stats.linregress)
>
>
> How is this done by pure Python?
>
>
> Are there many using Rpy (rpy2) to acces the statistical functionalities
> provided by R?
> What are your experiences with this?
>
> Programming in python seems to be more convenient than in R but lacking
> the vast statistics.
>
>
> Thanks in advance,
> Timmie
>
>
> [1] predict is a generic function for predictions from the results of
> various model fitting functions. The function invokes particular methods
> which depend on the class of the first argument.
>
> Most prediction methods which similar to fitting linear models have an
> argument newdata specifying the first place to look for explanatory
> variables to be used for prediction. Some considerable attempts are made
> to match up the columns in newdata to those used for fitting, for
> example that they are of comparable types and that any factors have the
> same level set in the same order (or can be transformed to be so).
>  Time series prediction methods in package stats have an argument
> n.ahead specifying how many time steps ahead to predict.
>
> Eample:
>
>      x <- rnorm(15)
>      y <- x + rnorm(15)
>      t = predict(lm(y ~ x))
>
>      t => the predicted data determined by the linear model (comapare
> to scipy.stats.linregress)
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>

This is on the todo list
scipy.stats.linregress treats only the case with a single explanatory variable

doing it by explicitly
----------------------------
assumes x is data vector without constant, y is endogenous variable
for estimation
xnew are observation of explanatory variables for prediction
see scipy tutorial, the only thing to watch out for are the
matrix/array dimensions

>>> from scipy import linalg
>>> b,resid,rank,sigma = linalg.lstsq(np.c_[np.ones((x.shape[0],1)),x],y)
>>> b
array([[ 5.47073574],
       [ 0.6575267 ],
       [ 2.09241884]])
>>> xnewwc=np.c_[np.ones((xnew.shape[0],1)),xnew]
>>> ypred = np.dot(xnewwc,b)   # prediction with ols estimate of parameters b
>>> print np.c_[ynew, ypred, ynew - ypred]
[[ 8.23128832  8.69250962 -0.46122129]
 [ 9.14636291  9.66243911 -0.51607621]
 [-0.10198498 -0.27382934  0.17184436]]


or using the ols example from the cookbook to which I added a predict method
-----------------------------------------------------------------------------------------------------------------

#------------------------ try_olsexample.py

import numpy as np
from olsexample import ols

def generate_data(nobs):
    x = np.random.randn(nobs,2)
    btrue = np.array([[5,1,2]]).T
    y = np.dot(x,btrue[1:,:]) + btrue[0,:] + 0.5 * np.random.randn(nobs,1)
    return y,x

y,x = generate_data(15)

est = ols(y,x)  # initialize and estimate with ols, constant added by default
print 'ols estimate'
print est.b
print np.array([[5,1,2]])  # true coefficients

ynew,xnew = generate_data(3)
ypred = est.predict(xnew)

print '    ytrue        ypred        error'
print np.c_[ynew, ypred, ynew - ypred]
#------------------------------- EOF

output:

ols estimate
[[ 5.47073574]
 [ 0.6575267 ]
 [ 2.09241884]]
[[5 1 2]]
    ytrue        ypred        error
[[ 8.23128832  8.69250962 -0.46122129]
 [ 9.14636291  9.66243911 -0.51607621]
 [-0.10198498 -0.27382934  0.17184436]]


olsexample.py is in attachment is from the cookbook and I'm slowly reworking it.
fancier models will be in scipy.stats.models when they are ready for inclusion.

I'm using rpy (version 1) to check scipy.stats function, and for sure
the available methods are very extensive in R, while coverage of
statistics and econometrics in python packages including scipy is
spotty, some good spots and many missing pieces.

Josef
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: olsexample.py
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20090113/a3a6c942/attachment.ksh>


More information about the SciPy-User mailing list