[SciPy-Dev] stats, distributions, design choices

Evgeni Burovski evgeny.burovskiy at gmail.com
Thu Jun 13 17:42:36 EDT 2013


On Thu, Jun 13, 2013 at 10:02 PM, <josef.pktd at gmail.com> wrote:

> On Thu, Jun 13, 2013 at 4:46 PM, Evgeni Burovski
> <evgeny.burovskiy at gmail.com> wrote:
> > Looking into the source of stats.distributions, I'm a bit puzzled by the
> way
> > incorrect distribution parameters are handled. Most likely, I'm missing
> > something very simple, so I'd appreciate if someone knowledgeable can
> > comment on these:
> >
> > 1. For incorrect arguments of a distribution, rvs() raises a ValueError,
> but
> > pdf(), pmf() and their relatives return a magic badarg value:
> >
> >>>> from scipy.stats import norm
> >>>> norm._argcheck(0, -1)
> > False
> >>>> norm.pdf(1, loc=0, scale=-1)
> > nan
> >>>> norm.rvs(loc=0, scale=-1)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File
> >
> "/home/br/virtualenvs/scipy-dev/local/lib/python2.7/site-packages/scipy/stats/distributions.py",
> > line 617, in rvs
> >     raise ValueError("Domain error in arguments.")
> > ValueError: Domain error in arguments.
> >
> > Is there the rationale behind this? I'd naively expect a pdf to raise an
> > error as well --- or is there a use case where the current behavior is
> > preferrable?
>
> The same reason we also add nans instead of raising an exception in
> other places.
>
> When we calculate vectorized results, we still return the valid
> results, and nan at the invalid results.
> If there is only a scalar answer, then we raise an exception if inputs
> are invalid.
>

When, say, trying to evaluate a pdf outside of a distribution support, yes,
I understand. But what about the case where there's no chance of getting a
meaningful answer: say, trying to use a normal distribution with sigma=-1,
like in an example I mentioned.

Vectorization... hmmm, what would be a use case for something like this:

>>> from scipy.stats import poisson
>>> p = poisson([1, -1])
>>> p.pmf(2)
array([ 0.18393972,         nan])

while rvs() throws anyway:
>>> p.rvs()
File
"/home/br/virtualenvs/scipy-dev/local/lib/python2.7/site-packages/scipy/stats/distributions.py",
line 7472, in _rvs
    Pk = k*log(mu)-gamln(k+1) - mu
  File "mtrand.pyx", line 3681, in mtrand.RandomState.poisson
(numpy/random/mtrand/mtrand.c:17130)
ValueError: lam < 0

The fix is trivial, and I can turn this into a pull request, if that's a
more appropriate medium for this discussion.

Zhenya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20130613/1e705087/attachment.html>


More information about the SciPy-Dev mailing list