[SciPy-Dev] warnings in scipy.stats.entropy

Mon May 21 19:34:54 EDT 2012

On Mon, May 21, 2012 at 7:23 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, May 21, 2012 at 7:11 PM,  <josef.pktd at gmail.com> wrote:
>> On Mon, May 21, 2012 at 6:43 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Mon, May 21, 2012 at 11:39 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>>> Currently in scipy.stats.entropy if you are not ignoring them you will
>>>> see warnings when the function is given a probability of zero even
>>>> though the case of zero is specifically handled in the function.
>>>> Rightly or wrongly this makes me cringe. What do people think about
>>>> fixing this by using seterr explicitly in the function or masking the
>>>> zeros. Eg.,
>>>>
>>>> import numpy as np
>>>> from scipy.stats import entropy
>>>>
>>>> prob = np.random.uniform(0,20, size=10)
>>>> prob[5] = 0
>>>> prob = prob/prob.sum()
>>>>
>>>> np.seterr(all = 'warn')
>>>> entropy(prob) # too loud
>>>>
>>>> Instead we could do (within entropy)
>>>>
>>>> oldstate = np.geterr()
>>>> np.seterr(divide='ignore', invalid='ignore')
>>>> entropy(prob)
>>>> np.seterr(**oldstate)
>>>>
>>>> or just mask the zeros in the first place if this is too much
>>>>
>>>> idx = prob > 0
>>>> -np.sum(prob[idx] * np.log(prob[idx]))
>>>>
>>>> Thoughts?
>>>
>>> I like the mask version better.
>>
>> +1,
>
> https://github.com/scipy/scipy/pull/226

won't work as replacement, if qk is None then the function is
vectorized for axis=0

>>> rr
array([[ 0.13878479,  0.03527334,  0.12000785,  0.14706888],
       [ 0.07682377,  0.12749588,  0.15172758,  0.19499206],
       [ 0.10462715,  0.1766166 ,  0.        ,  0.09346067],
       [ 0.02208519,  0.14443609,  0.11331574,  0.15090141],
       [ 0.00830154,  0.06009464,  0.05424912,  0.11603281],
       [ 0.05205531,  0.0792505 ,  0.02387006,  0.0061777 ],
       [ 0.00526626,  0.08439299,  0.17298407,  0.09992403],
       [ 0.16510456,  0.07008839,  0.01962196,  0.07101189],
       [ 0.23265325,  0.15908956,  0.2072021 ,  0.08105922],
       [ 0.19429818,  0.06326201,  0.13702153,  0.03937134]])

>>> stats.entropy(rr)
array([ 1.9678332 ,  2.19817097,  2.0136922 ,  2.1379255 ])

>>> -(rr[idx]*np.log(rr[idx])).sum(0)
8.3176218626994789
>>> stats.entropy(rr).sum()
8.3176218626994789

Josef

>
>>
>> buggy: if qk is given, then the function isn't vectorized.
>>
>> Josef
>>
>>>
>>> - N
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev