Programmatically finding "significant" data points

Tue Nov 14 09:36:28 EST 2006

If the order doesn't matter, you can sort the data and remove x * 0.5 *
n where x is the proportion of numbers you want. If you have too many
similar values though, this falls down. I suggest you check out
quantiles in a good statistics book.

Alan.

Peter Otten wrote:

> erikcw wrote:
>
> > Hi all,
> >
> > I have a collection of ordered numerical data in a list.  The numbers
> > when plotted on a line chart make a low-high-low-high-high-low (random)
> > pattern.  I need an algorithm to extract the "significant" high and low
> > points from this data.
> >
> > Here is some sample data:
> > data = [0.10, 0.50, 0.60, 0.40, 0.39, 0.50, 1.00, 0.80, 0.60, 1.20,
> > 1.10, 1.30, 1.40, 1.50, 1.05, 1.20, 0.90, 0.70, 0.80, 0.40, 0.45, 0.35,
> > 0.10]
> >
> > In this data, some of the significant points include:
> > data[0]
> > data[2]
> > data[4]
> > data[6]
> > data[8]
> > data[9]
> > data[13]
> > data[14]
> > ....
> >
> > How do I sort through this data and pull out these points of
> > significance?
>
> I think you are looking for "extrema":
>
> def w3(items):
>     items = iter(items)
>     view = None, items.next(), items.next()
>     for item in items:
>         view = view[1:] + (item,)
>         yield view
>
> for i, (a, b, c) in enumerate(w3(data)):
>     if a > b < c:
>         print i+1, "min", b
>     elif a < b > c:
>         print i+1, "max", b
>     else:
>         print i+1, "---", b
> 
> Peter