[SciPy-Dev] two new scipy.stats requests code included:

Mon Oct 15 13:10:07 EDT 2018

On Mon, Oct 15, 2018 at 12:45 PM Paul Hobson <pmhobson at gmail.com> wrote:

> Hey Jon,
>
> To incorporate this into scipy, you'll need to open a pull request on
> GitHub:
> https://github.com/scipy/scipy
>
> I'm not a scipy contributor, but I can tell you that you'll also need to
> include tests that preferably use a (small) published dataset and confirm
> that your function reproduce the published results.
>
> Also, I don't think your return statements are behaving the way you think
> they are. I believe that the preference is now to return a NamedTuple.
>
> Hope that helps,
> -Paul
>
>
>
> On Mon, Oct 15, 2018 at 2:54 AM Jon Stein <oneday2one at icloud.com> wrote:
>
>> Scipy-dev,
>>
>> Two additions to the scipy.stats module are missing and needed:
>>
>> One addition is needed for a one sample z-test including confidence
>> interval when the population mean and standard deviation are known:
>>
>> def ztest(array_A, population_mean, population_stdv, level_of_confidence~*example:
>> .95*):
>>     z_statistic = (array_A.mean() - population_stdv) / (population_stdv /
>> math.sqrt(len(array_A)))
>>     p_value = (st.norm.cdf(z_stat))
>>     standard_error = population_stdv / math.sqrt(len(array_A))
>>     margin_of_error = st.norm.ppf(level_of_confidence) * standard_error
>>     MoE = margin_of_error
>>     return('z statistic =', z_statistic, 'p-value =', p_value,
>> array_A.mean() - MoE, array_A.mean() + MoE)
>>
>> And one addition is needed for a one-sample z-test for a categorical
>> sample (*not quantitative*):
>>
>> def ztest_1sample_categorical(sample_proportion, population_proportion,
>> sample_size):
>>     sp, pp = sample_proportion, population_proportion
>>     z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size)
>>     p = st.norm.cdf(z)
>>     return('z statistic =', z, 'p value =', p)
>>
>> Let me know what you think.
>> Jon Stein
>>
>
I think some discussion and decisions are needed for whether and how to add
this.

None of the hypothesis test currently returns a confidence interval.
Tuples are a pain because we cannot just return additional results without
breaking backwards compatibility.
Both ztests are based on summary statistics, for which scipy.stats has
already some cases.

Adding special cases like ztest_1sample_categorical opens up a large set of
statistical functions that could similarly be added, e.g. for poisson rates.
Additionally some tests have a choice of methods across stats package, e.g.
using pp corresponds to a score test (variance under the Null). And
alternative is to use variance based on sp, which corresponds to a Wald
test.
In the statsmodels version there is an extra option, but it doesn't have
the correct default.
For a two sample version for comparing proportions, the number of options
and available methods becomes much larger.
(Development for this in statsmodels is slow because I only find time every
once in a while to review or prepare PRs
https://github.com/statsmodels/statsmodels/pull/4829 )

I think some overlap in basic statistics functions between scipy.stats and
statsmodels is useful. However, the question where to draw the boundary is
always open.

Josef

>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181015/930e299d/attachment.html>