[SciPy-User] norm.logpdf fails with float128?

Fri Feb 1 14:03:19 EST 2013

On Fri, Feb 1, 2013 at 1:44 PM, Jacob Biesinger
<jake.biesinger at gmail.com> wrote:
> On Fri, Feb 1, 2013 at 7:16 AM, <josef.pktd at gmail.com> wrote:
>>
>> please reply at end of message or inline, this is a bottom-posting
>> mailinglist
>
>
> Sorry about that.  The post was from my mobile during my (walking) commute.
>
>>
>> >> I don't know what we should do about it, some of the special functions
>> >> can only handle float64, and float32 I think.
>> >>
>> >> 1) status quo: let users decide, and raise an exception as now if
>> >> results cannot be cast to float64.
>> >>
>> >> 2) status quo with explicit type check of arguments: we can raise an
>> >> exception before doing the calculations.
>> >> We could return output as float32 if input is also float32
>> >>
>> >> 3) match output type to input type (at least a floating point, no
>> >> integers): Will work in some cases, like norm.pdf, will not work
>> >> (exception or implicit casting) if the supporting functions
>> >> (scipy.special, scipy.integrate) don't support it.
>> >> I don't know if all of them raise exceptions with downcasting.
>>
>> 4) add dtype argument to methods
>>
>> implementation: the current *args, **kwds parsing would need to be
>> changed, since with two exceptions, we rely on loc scale being the
>> last arguments, Ralph is just cleaning up this a bit
>> https://github.com/scipy/scipy/pull/400
>>
>> usability: for the usage in statsmodels, automatic (input dependent)
>> casting would be more convenient. Since we cannot hardcode the dtype,
>> because we would like different dtypes at different calls to the
>> function, we would have to choose the dtype case dependent. Would  be
>> nicer if that is in the methods directly.
>> edit: not really a problem, we could just do (..., dtype=x.dtype)
>>
>
> It would be nice to have both options: use x's dtype by default but let the
> user specify what the output dtype should be.  Does the latter imply that x,
> loc, and scale would be cast before the pdf/cdf/et al operation?

Currently nothing is explicitely cast, except for the output array, as
far as I can see and remember.

One argument in favor of just using a dtype option is that we don't
have to worry about which dtype to pick for the output if there are
several different dtypes used in the arguments (which will be often
the case).

I thought for pdf and cdf we could cast according to x, but for the
other methods it's not obvious, isf, ppf, ....

>
>>
>>
>> ---
>> I think we should widen the supported dtypes, but I think that we are
>> too close to a release to change this now because the impact of any
>> change is not so easy to guess, and we might need more time to figure
>> out how to handle it.
>
>
> No problem for me.  Since this is research code, I can just hack the changes
> temporarily.
>
>>
>>
>> float128 would be useful to have higher precision in some of the
>> methods, especially when they are struggling numerically close to
>> corner or extreme cases. (Although it won't help me on Windows.)
>> But maybe we have the precision already in the actual calculations, we
>> just don't return it.
>
>
> It seems like that's the case at least for several of the norm functions.
>
>>
>>
>> Just to get a better background
>>
>> What's your use case for using float128, why don't you cast to float64?
>
>
> I have been using float64 in this code for a while.  A change in
> parameterization of a hefty graphical model to high-dimensional gaussian
> observed variables has led to underflow nightmares. I've tried several
> normalization schemes but they all eventually they fail. Upgrading to
> float128's lets the model refine quite a bit more and that's all I need in
> this code.

Ok, that's the gain in numerical precision I would expect in many of
the distributions.

As a cautionary note:
use loc and scale as kwds and not as positional arguments, because
there might still be strange hard coded behavior in the argument
parsing, especially if you add an extra keyword.

>
>>
>>
>> The test suite has a list of distributions and example parameters that
>> makes it easy to loop over all distributions. As a relatively quick
>> check, you could run them on the private methods, _pdf _cdf, and see
>> which return again float128.
>> I have no idea about the number of cases that would work correctly,
>> there might be exceptions and implicit casting in some of the helper
>> functions.
>
>
> Cool.  I'll have a look and post back here.

Thanks, I'm curious about the result

Josef

>
>>
>>
>> Josef
>>
>> >>
>> >> other versions like always cast to float64 goes against the trend to
>> >> be more explicit.
>> >>
>> >> ???
>> >>
>> >> Josef
>> >>
>> >>
>> >> >
>> >> > --
>> >> > Jake Biesinger
>> >> > Graduate Student
>> >> > Xie Lab, UC Irvine
>> >> >
>> >> > _______________________________________________
>> >> > SciPy-User mailing list
>> >> > SciPy-User at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >> >
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> >
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>