[SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Apr 25 15:04:49 EDT 2012


On Wed, Apr 25, 2012 at 2:12 PM, nicky van foreest <vanforeest at gmail.com> wrote:
> Hi Josef,
>
> Sorry for not responding earlier... too many obligations.
>
> Before I get back to your earlier mail, I have some naive questions
> about distributions.py. I hope you don't mind that I fire them at you.
>
> 1:
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436

I never looked at this. It's not used anywhere.

>
> Is this code "dead"? Within distributions.py it is not called. Nearly
> the same code is written here:
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180

This is what is used for the generic ppf.

>
>
> 2:
>
> I have a similar point about:
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358
>
> What is the use of this code? It is not called anywhere. Besides this,
> from our  discussion about ticket 1493, this function returns the
> centralized moments, while the "real" moment E(X^n) should be
> returned. Hence, the code is also not correct, i.e., not in line with
> the documentation.

I think this and skew, kurtosis are internal functions for fit_start,
getting starting values for fit from the data, even if it's not used.
in general: For the calculations it might sometimes be nicer to
calculate central moments, and then convert them to non-central or the
other way around. I have some helper functions for this in statsmodels
and it is similarly used

https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1745

(That's new code that I'm not so familiar with.)

>
> 3:
>
> Suppose we would turn xa and xb into private atrributes _xa and _xb,
> then i suppose that
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883
>
> requires updating.

Yes, but no big loss I think,  given that it won't be needed anymore

>
>
> 4:
>
> I have a hard time understanding the working (and goal) of
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883

This ?
xb : float, optional
Upper bound for fixed point calculation for generic ppf.

>
>
> Where is the right place to ask for some clarification? Or should I
> just think harder?
>
> 5:
>
> The definition of arr in
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60
>
> does not add much (although it saves some characters at some points of
> the code), but makes it harder to read the code for novices like me.
> (I spent some time searching for a numpy function called arr, only to
> find out later that it was just a shorthand only used in the
> distribution.py module). Would it be a problem to replace such code by
> the proper numpy function?

But then these novices would just read some piece code instead of
going through all 7000 lines looking for imports and redefinitions.
And I suffered the same way. :)

I don't have any problem with cleaning this up. I never checked if in
some cases with lot's of generic loops the namespace lookup would
significantly increase the runtime.

>
> 6:
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538
>
> contains a typo. It should be Weisstein.

should be fixed then

>
> 7:
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625
>
>
> This code gives me even a harder time than _argsreduce. I have to
> admit that I simply don't know what this code is trying to
> prevent/check/repair. Would you mind giving a hint?

whats _argsreduce?

https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625

This has been rewritten by Per Brodtkorb.
It is used in most methods to get the goodargs with which the
distribution specific method is called.

example ppf https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1524

first we are building the conditions for valid, good arguments.
boundaries are filled, invalid arguments get nans.
What's left over are the goodargs, the values of the method arguments
for which we need to calculate the actual results.
So we need to broadcast and select those arguments. -> argsreduce
The distribution specific or generic ._ppf is then called with 1d
arrays (of the same shape IIRC) of goodargs.

then we can "place" the calculated values into the results arrays,
next to the nans and boundaries.

I hope that helps

Thanks,

Josef

>
> Nicky
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list