[SciPy-User] deterministic random variable

josef.pktd at gmail.com josef.pktd at gmail.com
Fri May 28 16:48:16 EDT 2010


On Fri, May 28, 2010 at 4:28 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi,
>
> Nice to see the issue to be taken up again.
>
>>> Discrete distributions on the real line don't *have* a pdf...
>>
>> Well, they *have* one; they just can't be implemented in floating point. :-)
>
> A distribution function can be decomposed in a part that can be
> represented by a pdf (absolute continuous), and a part that can be
> represented by a pmf (jumps), and some extra stuff (Cantor like
> functions) that we can safely neglect from a numerical point of view.
> (The discussion above is resolved in any book on measure theory, and
> covered by the Lebesgue decomposition theorem, for the interested...)
>
> I don't know how to resolve the name problem about pdf and pmf. I must
> admit I find it quite disturbing, since I also make these typo's, but
> I don't know how to resolve this neatly.
>
>>>> snip
> pdf(x), cdf(x)  with x float would need to know whether x is a support
> point, but which might not be equal to the actual point because of
> floating point problems.
> So, the direct translation of rv_discrete doesn't work, and it looks
> like at least pdf needs to be accessible either pointwise for queries
> or using known support points for actual calculations.
>>>>
> About representing floats in a hashtable, this is indeed hard to
> resolve. However, for the particular purpose of defining a random
> variable with support on a finite set of reals, it might suffice to
> represent these reals by fractions, for instance, \pi \approx 22/7 (I
> realize better approximations exist.), and then store 22 and 7
> separately. Then generalize rv_discrete such that it accepts tuples
> like (22, 7, 1.) with dtype (int, int, float).

What is the float in this? how do you find which fractions to use?

I don't want to restrict necessarily to finite number of points, but
countable, e.g. what's the distribution of sqrt(x) where x is Poisson
(just made up).
I still need to think about this, I thought the cheapest might be
approx_equal rounding, or searchsorted for the finite case.

But I think the direct access for a specific x won't be a big usecase,
because the calculations for expectation, cdf or other calculations
can loop over the array of support points. That's why I was thinking
about dual access to pmf.

>
>>>>
> No fun, and EDA dropped.
>>>>
> EDA dropped? I don't know what EDA means. I hope it does not have
> severe consequences.

today is my lucky day with typos, how about ETA
http://en.wikipedia.org/wiki/Estimated_time_of_arrival

Josef
http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm

>
> Nicky
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list