[SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493

nicky van foreest vanforeest at gmail.com
Wed Apr 25 16:03:17 EDT 2012


>> 1:
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436
>
> I never looked at this. It's not used anywhere.
>
>>
>> Is this code "dead"? Within distributions.py it is not called. Nearly
>> the same code is written here:
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180
>
> This is what is used for the generic ppf.

Yes, sure. Sorry for confusing you. L1180 makes good sense. But since
L1180 is there, there appears to be no good reason to include the code
at L436.


>> 2:
>>
>> I have a similar point about:
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358
>>
>> What is the use of this code? It is not called anywhere. Besides this,
>> from our  discussion about ticket 1493, this function returns the
>> centralized moments, while the "real" moment E(X^n) should be
>> returned. Hence, the code is also not correct, i.e., not in line with
>> the documentation.
>
> I think this and skew, kurtosis are internal functions for fit_start,
> getting starting values for fit from the data, even if it's not used.
> in general: For the calculations it might sometimes be nicer to
> calculate central moments, and then convert them to non-central or the
> other way around. I have some helper functions for this in statsmodels
> and it is similarly used
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1745
>
> (That's new code that I'm not so familiar with.)

I actually saw this code, and have my doubts about whether this is the
best way to compute the non-central moments. Suppose that the
computation of the central moment involves quad(). Then indeed the
computations at these lines don't require a new call to quad().
However, there is a (slow) python for loop involved, the power
function ** is called multiple times, and { n \choose k}  is computed.
(BTW, can I safely assume you use Latex?). Calling quad() on x**k to
compute E(X^k) might be just a fast, although I did not test this
hunch. Anyway quad( lamdba x: x**k *_pdf(x)) reads much easier.

>
>>
>> 3:
>>
>> Suppose we would turn xa and xb into private atrributes _xa and _xb,
>> then i suppose that
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883
>>
>> requires updating.
>
> Yes, but no big loss I think,  given that it won't be needed anymore

Oops. Your other mail convinced to do use _xa and _xb.... See my other mail.

>> 5:
>>
>> The definition of arr in
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60
>>
>> does not add much (although it saves some characters at some points of
>> the code), but makes it harder to read the code for novices like me.
>> (I spent some time searching for a numpy function called arr, only to
>> find out later that it was just a shorthand only used in the
>> distribution.py module). Would it be a problem to replace such code by
>> the proper numpy function?
>
> But then these novices would just read some piece code instead of
> going through all 7000 lines looking for imports and redefinitions.
> And I suffered the same way. :)

I suppose you did :-)

>
> I don't have any problem with cleaning this up. I never checked if in
> some cases with lot's of generic loops the namespace lookup would
> significantly increase the runtime.

Should it? I am not an expert on this, but I read in Langtangen's book
that importing functions like so: from numpy import array, and so on,
does not add much to the calling time of functions. However, if I am
mistaken, please forget this point.

>
>>
>> 6:
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538
>>
>> contains a typo. It should be Weisstein.
>
> should be fixed then

Should this become a ticket, or is it too minor?

>
>>
>> 7:
>>
>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625
>>
>>
>> This code gives me even a harder time than _argsreduce. I have to
>> admit that I simply don't know what this code is trying to
>> prevent/check/repair. Would you mind giving a hint?
>
> whats _argsreduce?

Sorry, I meant __argcheck(). The code at

https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1195

is not very simple to understand, at least not for me.

>
> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625
>
> This has been rewritten by Per Brodtkorb.
> It is used in most methods to get the goodargs with which the
> distribution specific method is called.
>
> example ppf https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1524
>
> first we are building the conditions for valid, good arguments.
> boundaries are filled, invalid arguments get nans.
> What's left over are the goodargs, the values of the method arguments
> for which we need to calculate the actual results.
> So we need to broadcast and select those arguments. -> argsreduce
> The distribution specific or generic ._ppf is then called with 1d
> arrays (of the same shape IIRC) of goodargs.
>
> then we can "place" the calculated values into the results arrays,
> next to the nans and boundaries.
>
> I hope that helps

I'll try to understand it again.

Thanks for your hints.

>
> Thanks,
>
> Josef
>
>>
>> Nicky
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list