Programmatically finding "significant" data points
Alan J. Salmoni
salmoni at gmail.com
Tue Nov 14 09:36:28 EST 2006
If the order doesn't matter, you can sort the data and remove x * 0.5 *
n where x is the proportion of numbers you want. If you have too many
similar values though, this falls down. I suggest you check out
quantiles in a good statistics book.
Alan.
Peter Otten wrote:
> erikcw wrote:
>
> > Hi all,
> >
> > I have a collection of ordered numerical data in a list. The numbers
> > when plotted on a line chart make a low-high-low-high-high-low (random)
> > pattern. I need an algorithm to extract the "significant" high and low
> > points from this data.
> >
> > Here is some sample data:
> > data = [0.10, 0.50, 0.60, 0.40, 0.39, 0.50, 1.00, 0.80, 0.60, 1.20,
> > 1.10, 1.30, 1.40, 1.50, 1.05, 1.20, 0.90, 0.70, 0.80, 0.40, 0.45, 0.35,
> > 0.10]
> >
> > In this data, some of the significant points include:
> > data[0]
> > data[2]
> > data[4]
> > data[6]
> > data[8]
> > data[9]
> > data[13]
> > data[14]
> > ....
> >
> > How do I sort through this data and pull out these points of
> > significance?
>
> I think you are looking for "extrema":
>
> def w3(items):
> items = iter(items)
> view = None, items.next(), items.next()
> for item in items:
> view = view[1:] + (item,)
> yield view
>
> for i, (a, b, c) in enumerate(w3(data)):
> if a > b < c:
> print i+1, "min", b
> elif a < b > c:
> print i+1, "max", b
> else:
> print i+1, "---", b
>
> Peter
More information about the Python-list
mailing list