[SciPy-Dev] New feature: fitting distributions to censored data.

Ralf Gommers ralf.gommers at gmail.com
Sat Mar 27 05:42:27 EDT 2021


On Fri, Mar 26, 2021 at 5:56 PM Warren Weckesser <warren.weckesser at gmail.com>
wrote:

> On 3/26/21, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> > On Thu, Mar 25, 2021 at 4:24 PM Warren Weckesser
> > <warren.weckesser at gmail.com>
> > wrote:
> >
> >> Hey folks,
> >>
> >> A new enhancement for fitting probability distributions to censored
> >> data by maximum likelihood estimation is in progress in
> >>
> >>     https://github.com/scipy/scipy/pull/13699
>
> >>
> >> Here's the calculation using the proposed API in SciPy:
> >>
> >> In [55]: from scipy.stats import weibull_min, CensoredData
> >>
> >> Create the `CensoredData` instance:
> >>
> >> In [56]: x = np.array([54, 187, 216, 240, 244, 335, 361, 373, 375,
> >> 386] + [500]*10)
> >>
> >> In [57]: data = CensoredData.right_censored(x, x == 500)
> >>
> >
> > This `x == 500` looks a little odd API-wise. Why not `right_censored(x,
> > 500)`. Or, more importantly, why not something like:
> >
> >     data = CensoredData(x).rightcensor(500)
> >
>
> The API is a work in progress.  I didn't do something like that
> because, in general, the lower bound for right-censored data isn't
> necessarily the same for each censored value.


Thanks, after the explanation of Robert and the examples below it makes
sense to me. I added some more comments on the PR.

(See, for example, the
> data shown in the second slide of
> http://www.ams.sunysb.edu/~zhu/ams588/Lecture_3_likelihood.pdf, and
> the example at
> https://support.sas.com/documentation/cdl/en/qcug/63922/HTML/default/viewer.htm#qcug_reliability_sect004.htm
> .
> Both of those data sets are used for unit tests in the PR.)  However,
> the case of a single bound for left- or right-censored data is common,
> so it might be nice to have a convenient way to write it.
>
> Another possible enhancement to the CensoredData API is a `count`
> argument that gives the number of times the value is to be repeated,
> but I figured I would propose that in a follow-up PR.
>
>
> > There are of course multiple ways of doing this, but the use of
> > classmethods and only taking arrays seems unusual. Also in the
> constructor,
> > why do `lower` and `upper` have to be boolean arrays rather than scalars?
>
> (The constructor args don't have to be boolean, so I assume you meant
> "have to be 1D arrays".)   I suppose accepting a scalar for one of
> them and using broadcasting would work.  I did a lot of searching for
> examples to use as unit tests, and I don't recall any where *all* the
> values were censored, so I don't think such behavior would actually be
> useful.  And I would worry that someone might misinterpret what using
> a scalar means.
>

Yes, that makes sense. The asymmetry between the constructor and the
`*_censored` classmethods is a concern though, it looks like you can't get
from one to the other. The scalar in your example threw me off, I agree a
scalar for `lower` or `upper` in the constructor is potentially confusing.

Cheers,
Ralf



> Warren
>
>
> >
> > Cheers,
> > Ralf
> >
> >
> >
> >>
> >> In [58]: print(data)
> >>
> >> CensoredData(20 values: 10 not censored, 10 right-censored)
> >>
> >> Fit `weibull_min` (with the location fixed at 0) to the censored data:
> >>
> >> In [59]: shape, loc, scale = weibull_min.fit(data, floc=0)
> >>
> >> In [60]: shape
> >>
> >> Out[60]: 1.720797180719942
> >>
> >> In [61]: scale
> >>
> >> Out[61]: 606.527565269458
> >>
> >>
> >> Matt Haberland has already suggested quite a few improvements to the
> >> PR. Additional comments would be appreciated.
> >>
> >> Warren
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/scipy-dev
> >>
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210327/9829c169/attachment.html>


More information about the SciPy-Dev mailing list