[SciPy-Dev] stats, distributions, design choices

Fri Jun 14 10:55:30 EDT 2013

On Thu, Jun 13, 2013 at 10:02 PM,  <josef.pktd at gmail.com> wrote:
> On Thu, Jun 13, 2013 at 4:46 PM, Evgeni Burovski
> <evgeny.burovskiy at gmail.com> wrote:
>> Looking into the source of stats.distributions, I'm a bit puzzled by the way
>> incorrect distribution parameters are handled. Most likely, I'm missing
>> something very simple, so I'd appreciate if someone knowledgeable can
>> comment on these:
>>
>> 1. For incorrect arguments of a distribution, rvs() raises a ValueError, but
>> pdf(), pmf() and their relatives return a magic badarg value:
>>
>>>>> from scipy.stats import norm
>>>>> norm._argcheck(0, -1)
>> False
>>>>> norm.pdf(1, loc=0, scale=-1)
>> nan
>>>>> norm.rvs(loc=0, scale=-1)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/home/br/virtualenvs/scipy-dev/local/lib/python2.7/site-packages/scipy/stats/distributions.py",
>> line 617, in rvs
>>     raise ValueError("Domain error in arguments.")
>> ValueError: Domain error in arguments.
>>
>> Is there the rationale behind this? I'd naively expect a pdf to raise an
>> error as well --- or is there a use case where the current behavior is
>> preferrable?
>
> The same reason we also add nans instead of raising an exception in
> other places.
>
> When we calculate vectorized results, we still return the valid
> results, and nan at the invalid results.
> If there is only a scalar answer, then we raise an exception if inputs
> are invalid.

Surely the ideal solution in this case would be to unconditionally
respect the value of np.geterr()["invalid"]? That way the
distributions would handle invalid input in the same as ufuncs like
np.log.

(Of course this would be easier if numpy also exposed some simple API
for raising such errors.)

-n