[SciPy-Dev] scipy.stats.ks_2samp with weighted data

Corin Hoad corinhoad at gmail.com
Sun Apr 22 10:39:04 EDT 2018


>
> What's the definition or interpretation of the weights?


Just out of curiosity: What is the significance of the weights?  If you are
> trying to represent the fact that distributional differences are more
> important in some regime than in another, e.g., you care more about the
> tails, then using weights is probably not the right approach
>

I briefly skimmed parts of the Monahan chapter and that is specific for
> importance sampling weights. In that case choosing the weights should just
> compensate for the unequal sampling of points. So maybe in that case the
> distribution of the KS (or AD) test statistic might not change much.
>

The weights are sample weights, indicating the relative frequency of
observations. My specific use case is in particle physics where studies
often involve simulated data. Certain particle physics processes may have a
large amount of simulated data available, but in reality we expect them to
be very rare so sample weights are used to compensate.

Corin

On 22 April 2018 at 14:29, <josef.pktd at gmail.com> wrote:

>
>
> On Sat, Apr 21, 2018 at 11:15 PM, Phillip Feldman <
> phillip.m.feldman at gmail.com> wrote:
>
>> Just out of curiosity: What is the significance of the weights?  If you
>> are trying to represent the fact that distributional differences are more
>> important in some regime than in another, e.g., you care more about the
>> tails, then using weights is probably not the right approach.
>>
>
> I don't remember for sup tests like KS, but for integral tests like
> Anderson-Darling there are variations of the test that use different
> weights to emphasize different regions of the distribution, e.g. Cramer-Von
> Mises uses different weights than AD
> https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test
>
>
> I briefly skimmed parts of the Monahan chapter and that is specific for
> importance sampling weights. In that case choosing the weights should just
> compensate for the unequal sampling of points. So maybe in that case the
> distribution of the KS (or AD) test statistic might not change much.
>
> In either case, I think the distribution of the test statistic depends on
> the meaning or interpretation of the weights.
>
> Josef
>
>
>
>>
>> On Sat, Apr 21, 2018 at 4:42 PM, Corin Hoad <corinhoad at gmail.com> wrote:
>>
>>> Hello developers,
>>>
>>> I recently needed an implementation of the Kolmogorov-Smirnov 2 sample
>>> test which required the incorporation of a weight associated with each
>>> element of the data.  This lead me to this stackexchange answer
>>> https://stats.stackexchange.com/questions/193439/two-sample-
>>> kolmogorov-smirnov-test-with-weights where a procedure for a weighted
>>> 2-sample KS test is taken from Numerical Methods of Statistics by Monohan.
>>>
>>> My current implementation of this can be found here:
>>>
>>> https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014
>>> b103319118b64be52070f001/tact/metrics.py#L198
>>>
>>> Would there by any interest in incorporating this functionality into
>>> scipy?
>>>
>>> Yours,
>>>
>>> Corin Hoad
>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180422/d2a17fd1/attachment.html>


More information about the SciPy-Dev mailing list