[SciPy-Dev] Truncated distributions

Mon Nov 5 04:20:51 EST 2018

On 11/5/18, Robert Kern <robert.kern at gmail.com> wrote:
> On Sun, Nov 4, 2018 at 10:36 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>>
>> On Sun, Nov 4, 2018 at 7:01 AM Ali Cetin <ali.cetin at outlook.com> wrote:
>>>
>>> Hi all,
>>>
>>> I note that quite a few truncated distribution functions are available
> in SciPy - nice!
>>>
>>> However, I find the usefulness of these functions somewhat limited when
> it is desired to fit them to data; in most common scenarios the truncation
> point is known (or even determined by the user/experimenter), and therefore
> do not need to be treated as a free parameter. In the current scipy.stats
> framework, the truncation parameters are accepted as "shape" parameters.
> Therefore, it is only possible to lock the "normalized" truncation point
> during fitting. This is a catch-22, since the user is required to provide
> loc and scale parameters a priori, which are unknown.
>>
>> I'm not sure I follow. The fit() docstring says:  "Return MLEs for shape
> (if applicable), location, and scale parameters from data." So it should be
> fitting everything. Could you provide an example perhaps?
>
> The problem is with how we defined the truncation parameters of these
> distributions, not `fit()` per se. You are supposed to specify them
> *relative* to the `loc` and `scale`. So you could fit `truncnorm()` such
> that the bounds are fixed to, say, (-3*sigma, +3*sigma) where `sigma` is
> freely being adjusted during the fit but not to (-10, +10) regardless of
> the `scale`. Ali wants to do the latter. `fit()` doesn't work for this use
> case.
>
>>> I would like to propose a solution that only require minor adjustment of
> the current framework: allow the user to supply the truncation point as
> part of the data array -> for left and right truncation, the 0th and the
> Nth element of the array is poped out and used as truncation parameters,
> respectively.
>>
>> That doesn't look like a viable solution. Let's first establish if there
> really is an issue like you're describing, and then look for a cleaner
> addition to the API.
>
> I agree that this is not the right approach.
>
> Personally, I think that it's hard to get all of the corner cases right to
> make `fit()` Just Work(TM). The distributions API has a lot of them. Most
> of `fit()`'s implementation is just trying to work around as many of them
> as it can in order to be general. But the core of it is pretty
> straightforward: use a `scipy.optimize` minimizer on the `nnlf()` method.
>
> This is the way I approach it: if `fit()` does what I want it to do, great,
> I'll use it. But if it looks hard to shoehorn what I want into `fit()`,
> I'll just go ahead and call `scipy.optimize.minimize()` myself. `fit()` is
> a good convenience when it works in the easy cases, but the alternative is
> straightforward for the harder ones. I don't see much benefit to trying to
> make `fit()` scale to the harder, weirder cases. I'd rather document the
> path up to `scipy.optimize`. Ali's example would be a good recipe to
> demonstrate this.


I agree, and I recently did just this in an answer on stackoverflow:
https://stackoverflow.com/questions/53125437/fitting-data-using-scipy-truncnorm

Warren


>
> --
> Robert Kern
>