[SciPy-user] Robust Statistics

Nuttall, Brandon C bnuttall at uky.edu
Wed Jul 30 13:41:15 EDT 2008


Folks,

I have some rate/time data I'm in the process of analyzing. The data represent the natural gas produced from a well over time. The data are fit best by either an exponential model (least squares best fit of log of rate data, just qi and di see below) or a "hyperbolic" model:

qt = qi *(1-b*di*t)**(-1.0/b)

        qt = rate at time t
        qi = initial rate (calculated)
        b = decline exponent (rate of change of the decline rate, calculated)
        di = initial decline rate (calculated)
        t = time

The calculated parameters are found using the optimization routine scipy.optimize.fmin_tnc(). I'm already rejecting some best fit results because either the correlation coefficient is not statistically significant or the di (slope) is not significantly different from 0.

The image I've attached shows two things. An exponential best fit is shown by the solid line and a hyperbolic best fit by the dotted line. It is common practice in analyzing this type of data to begin the best fit analysis at an early time point where the data actually start to "decline"; thus, the hyperbolic best fit started at time 2 and the first rate point (rate = about 1350) is ignored. The other thing the graph shows is that there appear to be outliers.

My question is, what Python resources are there for testing for outliers and robust statistics?

Is RANSAC appropriate for this? Note that when I run ransack.py from the cookbook, I get:

Traceback (most recent call last):

--Snip other traceback info---
  File "C:\Documents and Settings\bnuttall\Desktop\ransac.py", line 132, in fit
    A = numpy.vstack([data[:,i] for i in self.input_columns]).T
AttributeError: 'numpy.ndarray' object has no attribute 'T'

I presume the T attribute is supposed to mean the transpose operator and have changed the offending statement(s, there are 4 total) to the form:

A = numpy.vstack([data[:,i] for i in self.input_columns]).transpose()

Thanks.

Brandon Nuttall
-------------- next part --------------
A non-text attachment was scrubbed...
Name: R0114442_raw.png
Type: image/png
Size: 55799 bytes
Desc: R0114442_raw.png
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20080730/d680f206/attachment.png>


More information about the SciPy-User mailing list