Data smoothing algorithms? - Thank you all

Anthra Norell anthra.norell at tiscalinet.ch
Tue May 3 06:09:31 EDT 2005


To contribute to this interesting discussion, which after having provided
practical solutions, has become academic, I believe we are dealing with a
particularly innocuous case of noise infection, innocuous on account of the
noise being conspicuously distinct from the signal. The signal is the orbtal
and rotational motion of the earth. Its period is one year and it is
perfectly smooth in that it is not subject to disturbance, such as impacts
of high-end asteroids. The noise is rounding errors with a period of one
day. If I plot the data and look at the plot from a distance far enough to
make the jitter invisible, what I see is a smooth sinoidal kind of curve.
That is the signal. Looking at the curve close up, what I see is similar to
a trample path. A trample path is a line through a landscape from one place
to another, traced by footprints which display the latitude of bipeds
walking comfortably, that is without the ambition to place their feet
exactly on the line like tight-rope walkers. Looking at the trample path, I
can easly disregard its latitude and perceive the line it traces as its
lateral median. Applying the analogy, I look at my plot as if it were a
trample path, I can then reconstruct  a close approximation of the signal by
drawing a lateral median along the dots. It helps to know that the signal is
perfectly smooth, because that allows me to identify the lateral deviation
of each dot as an artifact of rounding to be discarded. Using a tracing
ruler flexible enough to follow the signal but too inflexible to follow the
rounding errors I could draft a very good line plot.
      Clearly, this kind of reconstruction is curve fitting. The result is
not a formula, but is quiet output data mapping noisy input data. I should
not expect any of my data points to come out exactly right. But that isn't
the goal of the exercise. The goal is to narrow down the margin of error to
consistently fall below a critical limit.
      Interpolation would be artificially increasing the number of data
points by guessing the most likely values between established ones. On the
line plot, the flexible ruler would have done a continous interpolation
together with the curve-fit. Working with numbers, I'd need an interpolation
algorithm to find intermediate data. As my data points happen to pertain to
midnight of each day, I would interpolate if I also needed data for, say,
twelve noon each day. The quality of interpolated data, quite obviously,
depends on the quality of the established adjacent data. That's why I don't
think interpolating noisy data reduces the noise.
      " ... gravitational perturbations, for more than two isolated bodies,
can NOT be predicted via simple polynomials." I have heard about that and
didn't think it would ever pertain to my activities. So I relegated it to
the cerebral regions labeled 'Conversation Topics'.
      " ... the JPL planetary ephemeris file(s) and a copy of Meeus
("Astronomical Algorithms" I believe)... Then basically write the code to
extract the data from the JPL ephemeris and convert to your desired
reference point (that is basically what is done to create these almanacs, in
the first place." This is a vey useful suggestion. Thanks a lot.

Frederic






More information about the Python-list mailing list