[Numpy-discussion] Interpolation question

Andrea Gavana andrea.gavana at gmail.com
Mon Mar 29 17:23:31 EDT 2010


Hi Kevin,

On 29 March 2010 01:38, Kevin Dunn wrote:
>> Message: 5
>> Date: Sun, 28 Mar 2010 00:24:01 +0000
>> From: Andrea Gavana <andrea.gavana at gmail.com>
>> Subject: [Numpy-discussion] Interpolation question
>> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
>> Message-ID:
>>        <d5ff27201003271724o6c82ec75v225d819c84140b46 at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi All,
>>
>>    I have an interpolation problem and I am having some difficulties
>> in tackling it. I hope I can explain myself clearly enough.
>>
>> Basically, I have a whole bunch of 3D fluid flow simulations (close to
>> 1000), and they are a result of different combinations of parameters.
>> I was planning to use the Radial Basis Functions in scipy, but for the
>> moment let's assume, to simplify things, that I am dealing only with
>> one parameter (x). In 1000 simulations, this parameter x has 1000
>> values, obviously. The problem is, the outcome of every single
>> simulation is a vector of oil production over time (let's say 40
>> values per simulation, one per year), and I would like to be able to
>> interpolate my x parameter (1000 values) against all the simulations
>> (1000x40) and get an approximating function that, given another x
>> parameter (of size 1x1) will give me back an interpolated production
>> profile (of size 1x40).
>
> [I posted the following earlier but forgot to change the subject - it
> appears as a new thread called "NumPy-Discussion Digest, Vol 42, Issue
> 85" - please ignore that thread]
>
> Andrea, may I suggest a different approach to RBF's.
>
> Realize that your vector of 40 values for each row in y are not
> independent of each other (they will be correlated).  First build a
> principal component analysis (PCA) model on this 1000 x 40 matrix and
> reduce it down to a 1000 x A matrix, called your scores matrix, where
> A is the number of independent components. A is selected so that it
> adequately summarizes Y without over-fitting and you will find A <<
> 40, maybe A = 2 or 3. There are tools, such as cross-validation, that
> will help select a reasonable value of A.
>
> Then you can relate your single column of X to these independent
> columns in A using a tool such as least squares: one least squares
> model per column in the scores matrix.  This works because each column
> in the score vector is independent (contains totally orthogonal
> information) to the others.  But I would be surprised if this works
> well enough, unless A = 1.
>
> But it sounds like your don't just have a single column in your
> X-variables (you hinted that the single column was just for
> simplification).  In that case, I would build a projection to latent
> structures model (PLS) model that builds a single latent-variable
> model that simultaneously models the X-matrix, the Y-matrix as well as
> providing the maximal covariance between these two matrices.
>
> If you need some references and an outline of code, then I can readily
> provide these.
>
> This is a standard problem with data from spectroscopic instruments
> and with batch processes.  They produce hundreds, sometimes 1000's of
> samples per row. PCA and PLS are very effective at summarizing these
> down to a much smaller number of independent columns, very often just
> a handful, and relating them (i.e. building a predictive model) to
> other data matrices.
>
> I also just saw the suggestions of others to center the data by
> subtracting the mean from each column in Y and scaling (by dividing
> through by the standard deviation).  This is a standard data
> preprocessing step, called autoscaling and makes sense for any data
> analysis, as you already discovered.

I have got some success by using time-based RBFs interpolations, but I
am always open to other possible implementations (as the one I am
using can easily fail for strange combinations of input parameters).
Unfortunately, my understanding of your explanation is very very
limited: I am not an expert at all, so it's a bit hard for me to
translate the mathematical technical stuff in something I can
understand. If you have an example code (even a very trivial one) for
me to study so that I can understand what the code is actually doing,
I would be more than grateful for your help :-)

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll
regret it forever.
http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==



More information about the NumPy-Discussion mailing list