[SciPy-User] Robust fitting of an exponential distribution subpopulation

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Mar 11 20:15:19 EDT 2015


On Wed, Mar 11, 2015 at 7:36 PM, Antonino Ingargiola <tritemio at gmail.com>
wrote:

> Hi Kevin,
>
> If I apply the log transform to the sample to linearize the models, what
> is the correct way to weight the residuals? Without weighting residual
> close to the tail will be amplified and bias the fit.
>

In RLM the robust linear model the weights are automatically chosen to
downweight extreme residuals. The weighting scheme depends on the "norm"
which defines the shape of the objective and of the weight function.

RLM produces an unbiased estimator of the mean or mean function for
symmetric distribution and is calibrated for the normal distribution. I
don't know how well this is approximated by the log of an exponentially
distributed variable, but it won't exactly satisfy the assumptions.

There should be a more direct way of estimating the parameter for the
exponential distribution in a robust way, but I never tried.
(one idea would be to estimate a trimmed mean and use the estimated
distribution to correct for the trimming. scipy.stats.distributions have an
`expect` method that can be used to calculate the mean of a trimmed
distribution, i.e. conditional on lower and upper bounds)


What's your sample size?
(for very large sample sizes one approach that is sometimes used, is to fit
a distribution to the central part of a histogram)

Josef



>
> Antonio
>
> On Wed, Mar 11, 2015 at 11:08 AM, Kevin Gullikson <
> kevin.gullikson at gmail.com> wrote:
>
>> Antonio,
>>
>> The statsmodels package has a robust linear model module that I have used
>> before. You will have to transform your data to be linear first by taking
>> the log of the y-axis.
>>
>>
>> http://statsmodels.sourceforge.net/stable/examples/notebooks/generated/robust_models_0.html
>>
>>
>> Kevin Gullikson
>>
>> On Wed, Mar 11, 2015 at 12:04 PM, Antonino Ingargiola <tritemio at gmail.com
>> > wrote:
>>
>>> Hi to the list,
>>>
>>> I'm seeking the advise of the scientific python community to solve the
>>> following fitting problem. Both suggestions on the methodology and on
>>> particular software packages are appreciated.
>>>
>>> I often encounter the need to fit a sample containing a (dominant)
>>> exponentially-distributed sub-population. Mostly the non-exponential
>>> samples (from an unknown distribution) are distributed close to the origin
>>> of the exponential distribution, therefore a simple approach I used so far
>>> is selecting all the samples higher than a threshold and fitting the
>>> exponential "tail" with MLE.
>>>
>>> The problem is that the choice of the threshold is somewhat arbitrary
>>> and moreover there can be a small set of outlier on the extreme right-side
>>> of the distribution that would bias the MLE fit.
>>>
>>> To improve the accuracy, I'm thinking of using (if necessary
>>> implementing) some kind of robust fitting procedure. For example using a
>>> scheme in which the outlier are identified by putting a threshold on the
>>> residual and then this threshold is optimized using some "goodness of fit"
>>> cost function. If this approach reasonable?
>>>
>>> I am surely not the first to tackle this problem, so I would appreciated
>>> some suggestion and specific pointers to help me getting started.
>>>
>>> Thank you,
>>> Antonio
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20150311/43edac05/attachment.html>


More information about the SciPy-User mailing list