[SciPy-Dev] warnings in scipy.stats.entropy

Mon May 21 22:43:00 EDT 2012

On Mon, May 21, 2012 at 7:34 PM,  <josef.pktd at gmail.com> wrote:
> On Mon, May 21, 2012 at 7:23 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>> On Mon, May 21, 2012 at 7:11 PM,  <josef.pktd at gmail.com> wrote:
>>> On Mon, May 21, 2012 at 6:43 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Mon, May 21, 2012 at 11:39 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>>>> Currently in scipy.stats.entropy if you are not ignoring them you will
>>>>> see warnings when the function is given a probability of zero even
>>>>> though the case of zero is specifically handled in the function.
>>>>> Rightly or wrongly this makes me cringe. What do people think about
>>>>> fixing this by using seterr explicitly in the function or masking the
>>>>> zeros. Eg.,
>>>>>
>>>>> import numpy as np
>>>>> from scipy.stats import entropy
>>>>>
>>>>> prob = np.random.uniform(0,20, size=10)
>>>>> prob[5] = 0
>>>>> prob = prob/prob.sum()
>>>>>
>>>>> np.seterr(all = 'warn')
>>>>> entropy(prob) # too loud
>>>>>
>>>>> Instead we could do (within entropy)
>>>>>
>>>>> oldstate = np.geterr()
>>>>> np.seterr(divide='ignore', invalid='ignore')
>>>>> entropy(prob)
>>>>> np.seterr(**oldstate)
>>>>>
>>>>> or just mask the zeros in the first place if this is too much
>>>>>
>>>>> idx = prob > 0
>>>>> -np.sum(prob[idx] * np.log(prob[idx]))
>>>>>
>>>>> Thoughts?
>>>>
>>>> I like the mask version better.
>>>
>>> +1,
>>
>> https://github.com/scipy/scipy/pull/226
>
> won't work as replacement, if qk is None then the function is
> vectorized for axis=0
>

Hmm, I didn't think it was intended for 2d cases since there is no
axis keyword and no tests for this. Docstring is unclear, but I've
only used it for 1d and...

import numpy as np

p = np.random.random((10,4))
p[2,3] = 0
q = np.random.random((10,4))
q[2,3] = 0

p /= p.sum(0)
q /= q.sum(0)

from scipy import stats

# bad logic for > 1d
# plus it would return inf, not a 1d array
stats.entropy(p,q)

stats.entropy(p.flatten(), q.flatten())

# len check not shape
q = np.random.random((10,3))

stats.entropy(p, q)

>>>> rr
> array([[ 0.13878479,  0.03527334,  0.12000785,  0.14706888],
>       [ 0.07682377,  0.12749588,  0.15172758,  0.19499206],
>       [ 0.10462715,  0.1766166 ,  0.        ,  0.09346067],
>       [ 0.02208519,  0.14443609,  0.11331574,  0.15090141],
>       [ 0.00830154,  0.06009464,  0.05424912,  0.11603281],
>       [ 0.05205531,  0.0792505 ,  0.02387006,  0.0061777 ],
>       [ 0.00526626,  0.08439299,  0.17298407,  0.09992403],
>       [ 0.16510456,  0.07008839,  0.01962196,  0.07101189],
>       [ 0.23265325,  0.15908956,  0.2072021 ,  0.08105922],
>       [ 0.19429818,  0.06326201,  0.13702153,  0.03937134]])
>
>>>> stats.entropy(rr)
> array([ 1.9678332 ,  2.19817097,  2.0136922 ,  2.1379255 ])
>
>>>> -(rr[idx]*np.log(rr[idx])).sum(0)
> 8.3176218626994789
>>>> stats.entropy(rr).sum()
> 8.3176218626994789
>
> Josef
>
>>
>>>
>>> buggy: if qk is given, then the function isn't vectorized.
>>>
>>> Josef
>>>
>>>>
>>>> - N
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev