[SciPy-Dev] warnings in scipy.stats.entropy
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon May 21 19:34:54 EDT 2012
On Mon, May 21, 2012 at 7:23 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, May 21, 2012 at 7:11 PM, <josef.pktd at gmail.com> wrote:
>> On Mon, May 21, 2012 at 6:43 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Mon, May 21, 2012 at 11:39 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>>> Currently in scipy.stats.entropy if you are not ignoring them you will
>>>> see warnings when the function is given a probability of zero even
>>>> though the case of zero is specifically handled in the function.
>>>> Rightly or wrongly this makes me cringe. What do people think about
>>>> fixing this by using seterr explicitly in the function or masking the
>>>> zeros. Eg.,
>>>>
>>>> import numpy as np
>>>> from scipy.stats import entropy
>>>>
>>>> prob = np.random.uniform(0,20, size=10)
>>>> prob[5] = 0
>>>> prob = prob/prob.sum()
>>>>
>>>> np.seterr(all = 'warn')
>>>> entropy(prob) # too loud
>>>>
>>>> Instead we could do (within entropy)
>>>>
>>>> oldstate = np.geterr()
>>>> np.seterr(divide='ignore', invalid='ignore')
>>>> entropy(prob)
>>>> np.seterr(**oldstate)
>>>>
>>>> or just mask the zeros in the first place if this is too much
>>>>
>>>> idx = prob > 0
>>>> -np.sum(prob[idx] * np.log(prob[idx]))
>>>>
>>>> Thoughts?
>>>
>>> I like the mask version better.
>>
>> +1,
>
> https://github.com/scipy/scipy/pull/226
won't work as replacement, if qk is None then the function is
vectorized for axis=0
>>> rr
array([[ 0.13878479, 0.03527334, 0.12000785, 0.14706888],
[ 0.07682377, 0.12749588, 0.15172758, 0.19499206],
[ 0.10462715, 0.1766166 , 0. , 0.09346067],
[ 0.02208519, 0.14443609, 0.11331574, 0.15090141],
[ 0.00830154, 0.06009464, 0.05424912, 0.11603281],
[ 0.05205531, 0.0792505 , 0.02387006, 0.0061777 ],
[ 0.00526626, 0.08439299, 0.17298407, 0.09992403],
[ 0.16510456, 0.07008839, 0.01962196, 0.07101189],
[ 0.23265325, 0.15908956, 0.2072021 , 0.08105922],
[ 0.19429818, 0.06326201, 0.13702153, 0.03937134]])
>>> stats.entropy(rr)
array([ 1.9678332 , 2.19817097, 2.0136922 , 2.1379255 ])
>>> -(rr[idx]*np.log(rr[idx])).sum(0)
8.3176218626994789
>>> stats.entropy(rr).sum()
8.3176218626994789
Josef
>
>>
>> buggy: if qk is given, then the function isn't vectorized.
>>
>> Josef
>>
>>>
>>> - N
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
More information about the SciPy-Dev
mailing list