Programmatically finding "significant" data points

Tue Nov 14 10:07:53 EST 2006

erikcw wrote:
> Hi all,
>
> I have a collection of ordered numerical data in a list.

Called a "time series" in statistics.

> The numbers
> when plotted on a line chart make a low-high-low-high-high-low (random)
> pattern.  I need an algorithm to extract the "significant" high and low
> points from this data.
>
> Here is some sample data:
> data = [0.10, 0.50, 0.60, 0.40, 0.39, 0.50, 1.00, 0.80, 0.60, 1.20,
> 1.10, 1.30, 1.40, 1.50, 1.05, 1.20, 0.90, 0.70, 0.80, 0.40, 0.45, 0.35,
> 0.10]
>
> In this data, some of the significant points include:
> data[0]
> data[2]
> data[4]
> data[6]
> data[8]
> data[9]
> data[13]
> data[14]
> ....
>
> How do I sort through this data and pull out these points of
> significance?

The best place to ask about an algorithm for this is not
comp.lang.python -- maybe sci.stat.math would be better. Once you have
an algorithm, coding it in Python should not be difficult. I'd suggest
using the NumPy array rather than the native Python list, which is not
designed for crunching numbers.