[SciPy-User] interpolation question
josef.pktd at gmail.com
josef.pktd at gmail.com
Wed Sep 23 16:42:33 EDT 2009
On Wed, Sep 23, 2009 at 12:10 PM, Oosthoek, J.H.P. (Jelmer)
<jelmer.oosthoek at tno.nl> wrote:
> Dear Zach and Josef,
>
> Thanks for your help! I think this will be more than enough for me to get
> this working :)
>
> Thanks!
>
> Jelmer
> ________________________________
> Van: josef.pktd at gmail.com
> Verzonden: wo 9/23/2009 16:17
> Aan: SciPy Users List
> Onderwerp: Re: [SciPy-User] interpolation question
>
> On Wed, Sep 23, 2009 at 9:53 AM, Zachary Pincus <zachary.pincus at yale.edu>
> wrote:
>>> Is it possible using scipy to fill the gaps in one list using
>>> another list?
>>>
>>> I have two lists, one with time values and one with meter values.
>>> Both lists are the same length and their indexes are linked
>>> (timelist[i] and depthlist[i] belong to the same location). The
>>> timevalue list has gaps (nodata values).
>>>
>>> The shape of both lists is very similar. What I would like to do is
>>> to use the depthlist shape to fill in the gaps of the timelist. Is
>>> there a method within scipy which does the trick? I first tried
>>> simple regression but that didn't result in a smooth line. I would
>>> like to be able to use the values bordering each gap as control
>>> points.
>>
>>
>> First: do these (depth, time) pairs, sorted by depth, describe a
>> proper function (e.g. no depth with two associated times, etc.)? If
>> not, you'll need to figure out a parametric representation of the
>> function: e.g. use the indices (i, depth) and (i, time) to
>> parameterized a 2D curve.
>>
>> Assuming the former case, the first thing you'll need to do is make
>> three lists: depth_in with the depth values which have associated
>> times, time_in with those time values, and depth_out, with the depth
>> values that have no associated times.
>>
>> times_out = numpy.interp(depth_out, depth_in, time_in) will then
>> linearly interpolate the time values for the depths in the depth_out
>> list.
>>
>> You can also use the spline interpolation routines in
>> scipy.interpolate. There are some convenient wrapper classes, but the
>> ones I use most often are the raw fitpack routines:
>>
>> smoothing = upper bound on sum of squared distances between the fit
>> curve and the (depth, time) input points. Can be zero, but then the
>> spline might be prone to ringing.
>> order = order of spline interpolation: 0 is nearest-neighbor, 1 is
>> linear, 3 is recommended in general.
>> tck = scipy.interpolate.splrep(depth_in, time_in, k=order, s=smoothing)
>> times_out = scipy.interpolate.splev(depth_out, tck)
>>
>> You'll probably want to look at what times the spline provides for the
>> depths provided in depth_in, to gauge if the curve is being over-
>> smoothed.
>>
>> In the parametric case, you'll want to either do two rounds of linear
>> interpolation for the (i, depth) and (i, time) curves, or use
>> scipy.interpolate.splprep.
>>
>> Also there is scipy.interpolate.Rbf which uses radial basis functions
>> to interpolate scattered data points. This might be a very smooth way
>> of interpolating your data as well.
>>
>> Zach
>>
>
> If you want to use the regression approach instead of interpolation, then
> I would try kernel regression, which would locally be much smoother than
> linear regression, but might in the outcome be very similar to
> scipy.interpolate.Rbf.
> But, except for some toy examples, I haven't used this.
If you are interested, you could try out the attached code. I wrote this some
time ago and posted it to the mailing list, but I don't think it has seen any
real use. Eventually, I would like to include something like this in
scikits.statsmodels.
Josef
>
> Josef
>
>>
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> This e-mail and its contents are subject to the DISCLAIMER at
> http://www.tno.nl/disclaimer/email.html
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
'''Kernel Ridge Regression for local non-parametric regression'''
import numpy as np
from scipy import spatial as ssp
from numpy.testing import assert_equal
import matplotlib.pylab as plt
def plt_closeall(n=10):
'''close a number of open matplotlib windows'''
for i in range(n): plt.close()
def kernel_rbf(x,y,scale=1, **kwds):
#scale = kwds.get('scale',1)
dist = ssp.minkowski_distance_p(x[:,np.newaxis,:],y[np.newaxis,:,:],2)
return np.exp(-0.5/scale*(dist))
def kernel_euclid(x,y,p=2, **kwds):
return ssp.minkowski_distance(x[:,np.newaxis,:],y[np.newaxis,:,:],p)
class GaussProcess(object):
'''class to perform kernel ridge regression (gaussian process)
Warning: this class is memory intensive, it creates nobs x nobs distance
matrix and its inverse, where nobs is the number of rows (observations).
See sparse version for larger number of observations
Notes
-----
Todo:
* normalize multidimensional x array on demand, either by var or cov
* add confidence band
* automatic selection or proposal of smoothing parameters
Reference
---------
Rasmussen, C.E. and C.K.I. Williams, 2006, Gaussian Processes for Machine
Learning, the MIT Press, www.GaussianProcess.org/gpal, chapter 2
'''
def __init__(self, x,y=None, kernel=kernel_rbf,
scale=0.5, ridgecoeff = 1e-10, **kwds ):
'''
Parameters
----------
x : 2d array (N,K)
data array of explanatory variables, columns represent variables
rows represent observations
y : 2d array (N,1) (optional)
endogenous variable that should be fitted or predicted
can alternatively be specified as parameter to fit method
kernel : function, default: kernel_rbf
kernel: (x1,x2)->kernel matrix is a function that takes as parameter
two column arrays and return the kernel or distance matrix
scale : float (optional)
smoothing parameter for the rbf kernel
ridgecoeff : float (optional)
coefficient that is multiplied with the identity matrix in the
ridge regression
Notes
-----
After initialization, kernel matrix is calculated and if y is given
as parameter then also the linear regression parameter and the
fitted or estimated y values, yest, are calculated. yest is available
as an attribute in this case.
Both scale and the ridge coefficient smooth the fitted curve.
'''
self.x = x
self.kernel = kernel
self.scale = scale
self.ridgecoeff = ridgecoeff
self.distxsample = kernel(x,x,scale=scale)
self.Kinv = np.linalg.inv(self.distxsample +
np.eye(*self.distxsample.shape)*ridgecoeff)
if not y is None:
self.y = y
self.yest = self.fit(y)
def fit(self,y):
'''fit the training explanatory variables to a sample ouput variable'''
self.parest = np.dot(self.Kinv,y)
yhat = np.dot(self.distxsample,self.parest)
return yhat
## print ds33.shape
## ds33_2 = kernel(x,x[::k,:],scale=scale)
## dsinv = np.linalg.inv(ds33+np.eye(*distxsample.shape)*ridgecoeff)
## B = np.dot(dsinv,y[::k,:])
def predict(self,x):
'''predict new y values for a given array of explanatory variables'''
self.xpredict = x
distxpredict = self.kernel(x,self.x,scale=self.scale)
self.ypredict = np.dot(distxpredict,self.parest)
return self.ypredict
def plot(self, y, plt=plt ):
'''some basic plots'''
#todo return proper graph handles
plt.figure();
plt.plot(self.x,self.y,'bo-',self.x,self.yest,'r.-')
plt.title('sample (training) points')
plt.figure()
plt.plot(self.xpredict,y,'bo-',self.xpredict,self.ypredict,'r.-')
plt.title('all points')
def example1():
m,k = 500,4
upper = 6
scale=10
xs1a = np.linspace(1,upper,m)[:,np.newaxis]
xs1 = xs1a*np.ones((1,4)) + 1/(1.0+np.exp(np.random.randn(m,k)))
xs1 /= np.std(xs1[::k,:],0) # normalize scale, could use cov to normalize
y1true = np.sum(np.sin(xs1)+np.sqrt(xs1),1)[:,np.newaxis]
y1 = y1true + 0.250 * np.random.randn(m,1)
stride = 2 #use only some points as trainig points e.g 2 means every 2nd
gp1 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_euclid,
ridgecoeff=1e-10)
yhatr1 = gp1.predict(xs1)
plt.figure()
plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
plt.title('euclid kernel: true y versus noisy y and estimated y')
plt.figure()
plt.plot(y1,'bo-',y1true,'go-',yhatr1,'r.-')
plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
'observations')
gp2 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_rbf,
scale=scale, ridgecoeff=1e-1)
yhatr2 = gp2.predict(xs1)
plt.figure()
plt.plot(y1true, y1,'bo',y1true, yhatr2,'r.')
plt.title('rbf kernel: true versus noisy (blue) and estimated (red) observations')
plt.figure()
plt.plot(y1,'bo-',y1true,'go-',yhatr2,'r.-')
plt.title('rbf kernel: true (green), noisy (blue) and estimated (red) '+
'observations')
#gp2.plot(y1)
def example2(m=100, scale=0.01, stride=2):
#m,k = 100,1
upper = 6
xs1 = np.linspace(1,upper,m)[:,np.newaxis]
y1true = np.sum(np.sin(xs1**2),1)[:,np.newaxis]/xs1
y1 = y1true + 0.05*np.random.randn(m,1)
ridgecoeff = 1e-10
#stride = 2 #use only some points as trainig points e.g 2 means every 2nd
gp1 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_euclid,
ridgecoeff=1e-10)
yhatr1 = gp1.predict(xs1)
plt.figure()
plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
plt.title('euclid kernel: true versus noisy (blue) and estimated (red) observations')
plt.figure()
plt.plot(y1,'bo-',y1true,'go-',yhatr1,'r.-')
plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
'observations')
gp2 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_rbf,
scale=scale, ridgecoeff=1e-2)
yhatr2 = gp2.predict(xs1)
plt.figure()
plt.plot(y1true, y1,'bo',y1true, yhatr2,'r.')
plt.title('rbf kernel: true versus noisy (blue) and estimated (red) observations')
plt.figure()
plt.plot(y1,'bo-',y1true,'go-',yhatr2,'r.-')
plt.title('rbf kernel: true (green), noisy (blue) and estimated (red) '+
'observations')
#gp2.plot(y1)
if __name__ == '__main__':
example2()
#example2(m=1000, scale=0.01)
#example2(m=100, scale=0.5) # oversmoothing
#example2(m=2000, scale=0.005) # this looks good for rbf, zoom in
#example2(m=200, scale=0.01,stride=4)
example1()
plt.show()
#plt_closeall() # use this to close the open figure windows
-------------- next part --------------
import numpy as np
import matplotlib.pyplot as plt
from kernridgeregress_class import GaussProcess, kernel_euclid
m,k = 50,4
upper = 6
scale=10
xs1 = np.linspace(1,upper,m)[:,np.newaxis]
#xs1 = xs1a*np.ones((1,4)) + 1/(1.0+np.exp(np.random.randn(m,k)))
#xs1 /= np.std(xs1[::k,:],0) # normalize scale, could use cov to normalize
y1true = np.sum(np.sin(xs1)+np.sqrt(xs1),1)[:,np.newaxis]
y1 = y1true + 0.010 * np.random.randn(m,1)
stride = 3 #use only some points as trainig points e.g 2 means every 2nd
gp1 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_euclid,
ridgecoeff=1e-10)
yhatr1 = gp1.predict(xs1)
plt.figure()
plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
plt.title('euclid kernel: true y versus noisy y and estimated y')
plt.figure()
plt.plot(y1,'bo-',y1true,'go-',yhatr1,'r.-')
plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
'observations')
More information about the SciPy-User
mailing list