[SciPy-Dev] Splines in Scipy [was: SciPy Goal]

Mon Jan 9 15:30:17 EST 2012

On Mon, Jan 9, 2012 at 2:06 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>
> On Jan 9, 2012, at 10:46 AM, josef.pktd at gmail.com wrote:
>
>> On Mon, Jan 9, 2012 at 8:02 AM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>>> Also, as long as a list is being made:
>>> scipy.signal has matched functions [cq]spline1d() and [cq]spline1d_eval(), but only [cq]spline2d(), with no matching _eval function.
>>>
>>> And as far as FITPACK goes, I agree can be extremely, and possibly dangerously, "quirky" -- it's prone to almost arbitrarily bad ringing artifacts when the smoothing coefficient isn't large enough, and is very (very) sensitive to initial conditions in terms of what will and won't provoke the ringing. It has its uses, but it seems to me odd enough that it really shouldn't be the "default" 1D spline tool to direct people to.
>>
>> Do you have an example of "arbitrarily" bad ringing?
>>
>>> From what I was reading up on splines in the last weeks, I got the
>> impression was that this is a "feature" of interpolating splines, and
>> to be useful with a larger number of points we always need to smooth
>> sufficiently (reduce knots or penalize).
>> (I just read a comment that R with 5000 points only chooses about 200 knots).
>
> Example below; it's using parametric splines because I have a simple interactive tool to draw them and notice occasional "blowing up" like what you see below. I *think* I've seen similar issues with normal splines, but haven't used them a lot lately. (For the record, treating the x and y values as separate and using the non-parametric spline fitting does NOT yield these crazy errors on *these data*...)
>
> As far as the smoothing parameter, the "good" data will go crazy if s=3, but is fine with s=0.25 or s=4; similarly the "bad" data isn't prone to ringing if s=0.25 or s=5. So there's serious sensitivity both to the x,y positions of the data (as below) and to the smoothing parameter in a fairly small range.
>
> I could probably come up with an example that goes crazy with even fewer input points, but this was the first thing I came up with. Small modifications to the input data seem to make it go even crazier, but the below illustrates the general point.

(disclaimer: as mentioned, I only started very recently to read
anything about smoothing splines, except for the scipy documentation)

I'm not so familiar with the parametric splines, but I might have seen
something similar with regular splines, but I ignored or worked around
it without paying attention.

The local behavior around 4, good: s>3.8 is fine, and bad: s>4.3 is
fine, might come because there is no non-crazy spline with the given
smoothness, and in this case increasing the smooothness factor makes
the exploding behavior go away.

One impression I had when I tried this out a few weeks ago, is that
the spline smoothing factor s is imposed with equality not inequality.
In the examples that I tried with varying s, the reported error sum of
squares always matched s to a few decimals.  (I don't know how because
I didn't see the knots change in some examples.)

In your example it looks like the spline algorithm only looks for
spline approximation in the neighborhood of those that give the
specified s. It does not search for better fitting splines with lower
s.
That's would explain the strange behavior that there are "nice" splines at 0.25.

In my recent examples, I used an information criterium (AIC just
because I had it available) to do a global search for the best s. It
looks to me like the current spline implementation only does a local
search with fixed s.
What I didn't try to figure out is how to avoid recalculating
everything for each different s.

In what I have been reading, they use either cross-validation or
information criteria to choose the smoothing parameters, but I haven't
read anything about whether the search needs to be global or can be
just a local search.

below I mainly add the code to plot to your example

Josef

>
> Zach
>
>
> import numpy
> import scipy.interpolate as interp
> good = numpy.array(
>      [[ 24.21162868,  28.75056713,  32.64108579,  36.85581434,
>         41.07054289,  46.582111  ,  52.417889  ,  55.17367305,
>         57.92945711,  61.00945105,  64.89996971,  72.19469221,
>         75.76100098,  83.21782842,  83.21782842,  88.56729158,
>         86.29782236,  90.18834103,  86.62203225],
>       [ 70.57364276,  71.22206254,  69.27680321,  72.5189021 ,
>         65.06207466,  70.89785265,  67.33154388,  68.62838343,
>         69.92522299,  67.00733399,  77.21994548,  68.30417354,
>         71.38416748,  71.38416748,  64.25154993,  70.08732793,
>         61.00945105,  63.44102521,  56.47051261]])
> bad = good.copy()
> # now make a *small* change
> bad[:,-1] = 87.432556973542049, 55.984197773255048
>
> good_tck, good_u = interp.splprep(good, s=4)
> bad_tck, bad_u = interp.splprep(bad, s=4)
> print good.ptp(axis=1)
> print numpy.array(interp.splev(numpy.linspace(good_u[0], good_u[-1], 300), good_tck)).ptp(axis=1)
> print numpy.array(interp.splev(numpy.linspace(bad_u[0], bad_u[-1], 300), bad_tck)).ptp(axis=1)
>
> And the output on my machine is:
> [ 65.97671235  20.74943287]
> [ 67.69845281  20.52518913]
> [ 2868.98673621  450984.86622631]

with plot and sum of squares fp

import numpy
import scipy.interpolate as interp
good = numpy.array(
     [[ 24.21162868,  28.75056713,  32.64108579,  36.85581434,
        41.07054289,  46.582111  ,  52.417889  ,  55.17367305,
        57.92945711,  61.00945105,  64.89996971,  72.19469221,
        75.76100098,  83.21782842,  83.21782842,  88.56729158,
        86.29782236,  90.18834103,  86.62203225],
      [ 70.57364276,  71.22206254,  69.27680321,  72.5189021 ,
        65.06207466,  70.89785265,  67.33154388,  68.62838343,
        69.92522299,  67.00733399,  77.21994548,  68.30417354,
        71.38416748,  71.38416748,  64.25154993,  70.08732793,
        61.00945105,  63.44102521,  56.47051261]])
bad = good.copy()
# now make a *small* change
bad[:,-1] = 87.432556973542049, 55.984197773255048

(good_tck, good_u), good_fp,_, _ = interp.splprep(good, s=0.25,
full_output=True) #3.8)
(bad_tck, bad_u), bad_fp,_, _ = interp.splprep(bad, s=4.3, full_output=True)
print good.ptp(axis=1)
xg = numpy.linspace(good_u[0], good_u[-1], 300)
yg = numpy.array(interp.splev(xg, good_tck))
xb = numpy.linspace(bad_u[0], bad_u[-1], 300)
yb = numpy.array(interp.splev(xb, bad_tck))
print yg.ptp(axis=1)
print yb.ptp(axis=1)
#And the output on my machine is:
#[ 65.97671235  20.74943287]
#[ 67.69845281  20.52518913]
#[ 2868.98673621  450984.86622631]

print 'fp'
print good_fp
print bad_fp

import matplotlib.pyplot as plt

plt.plot(good[0], good[1], 'bo', alpha=0.75)
plt.plot(bad[0], bad[1], 'ro', alpha=0.75)
plt.plot(yg[0], yg[1], 'b-')
plt.plot(yb[0], yb[1], 'r-')

plt.show()

>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev