Smoothing a discrete set of data

Terry Reedy tjreedy at udel.edu
Sat Sep 7 17:48:17 EDT 2002


"Paul Moore" <gustav at morpheus.demon.co.uk> wrote in message
news:3csmdso5.fsf at morpheus.demon.co.uk...
> I have a set of data, basically a histogram.

Actually, the two examples you give are *not* histograms.  A histogram
is a discrete frequency graph: how many numbers fit in each bit.  Note
that binning numbers like this them tosses away any order and other
covariate info.  Unimodal histograms are usually smoothed by moment
matching: mean, standard deviation, and possibly more.

>The data is pretty  variable, and I'd like to "smooth" it to find
trends.

This is too vague to be useful.  Most everything in statistics and
data analysis involves "smoothing" of some sort.

>  Actually, this comes up a *lot* for me - some examples:

Of course: see above

> sampled IO rates on a machine
> - I want to look for trends (is IO higher overnight or during the
day,

If you take measurements (samples) every hour, for instance, you have
a time series.  There are many books on this subject alone.  If you
reduce the time variable to a binary day/night variable, then you
would apply methods for two groups of data.  See below.

> etc) or fuel consumption for my car (do I use less fuel since I had
> the service).

This is a standard question with standard methods.  If you ignore
order and other covariates and group measurements as before and after,
a t-test or signed rank test would be appropriate.  If you add a
factor like city versus country driving between gas stops, then you
need analysis of variance.  On the other hand, you could graph milage
between service stops and see if there is a (downward) trend.

> Normally, what I do with things like this is draw a graph, and try
to
> spot the trends "by eyeball". But it feels to me like I should be
able
> to write something which smooths the data out. I just don't see how

As a statistician, I am a fan of eyeballing raw data (with appropriate
caveats about testing what you think you see) in addition to numerical
analysis.  However, each statistical procedure is aimed as answering a
question (and many are based on some assumption about the data).  So
you need to better formulate what you want to know.

Terry J. Reedy






More information about the Python-list mailing list