[SciPy-Dev] warnings in scipy.stats.entropy

josef.pktd at gmail.com josef.pktd at gmail.com
Mon May 21 23:29:19 EDT 2012


On Mon, May 21, 2012 at 10:43 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, May 21, 2012 at 7:34 PM,  <josef.pktd at gmail.com> wrote:
>> On Mon, May 21, 2012 at 7:23 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>> On Mon, May 21, 2012 at 7:11 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Mon, May 21, 2012 at 6:43 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>>> On Mon, May 21, 2012 at 11:39 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
>>>>>> Currently in scipy.stats.entropy if you are not ignoring them you will
>>>>>> see warnings when the function is given a probability of zero even
>>>>>> though the case of zero is specifically handled in the function.
>>>>>> Rightly or wrongly this makes me cringe. What do people think about
>>>>>> fixing this by using seterr explicitly in the function or masking the
>>>>>> zeros. Eg.,
>>>>>>
>>>>>> import numpy as np
>>>>>> from scipy.stats import entropy
>>>>>>
>>>>>> prob = np.random.uniform(0,20, size=10)
>>>>>> prob[5] = 0
>>>>>> prob = prob/prob.sum()
>>>>>>
>>>>>> np.seterr(all = 'warn')
>>>>>> entropy(prob) # too loud
>>>>>>
>>>>>> Instead we could do (within entropy)
>>>>>>
>>>>>> oldstate = np.geterr()
>>>>>> np.seterr(divide='ignore', invalid='ignore')
>>>>>> entropy(prob)
>>>>>> np.seterr(**oldstate)
>>>>>>
>>>>>> or just mask the zeros in the first place if this is too much
>>>>>>
>>>>>> idx = prob > 0
>>>>>> -np.sum(prob[idx] * np.log(prob[idx]))
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> I like the mask version better.
>>>>
>>>> +1,
>>>
>>> https://github.com/scipy/scipy/pull/226
>>
>> won't work as replacement, if qk is None then the function is
>> vectorized for axis=0
>>
>
> Hmm, I didn't think it was intended for 2d cases since there is no
> axis keyword and no tests for this. Docstring is unclear, but I've
> only used it for 1d and...

I works for 2d or nd if qk=None, and uses the (sometimes hidden) default axis=0.

If qk is given, it doesn't work but still uses axis=0 in the sum.

I would say typical state for a stats function that hasn't been
cleaned up. For the ones that I did clean up, I usually added the axis
keyword in cases like this.

Josef

>
> import numpy as np
>
> p = np.random.random((10,4))
> p[2,3] = 0
> q = np.random.random((10,4))
> q[2,3] = 0
>
> p /= p.sum(0)
> q /= q.sum(0)
>
> from scipy import stats
>
> # bad logic for > 1d
> # plus it would return inf, not a 1d array
> stats.entropy(p,q)
>
> stats.entropy(p.flatten(), q.flatten())
>
> # len check not shape
> q = np.random.random((10,3))
>
> stats.entropy(p, q)
>
>>>>> rr
>> array([[ 0.13878479,  0.03527334,  0.12000785,  0.14706888],
>>       [ 0.07682377,  0.12749588,  0.15172758,  0.19499206],
>>       [ 0.10462715,  0.1766166 ,  0.        ,  0.09346067],
>>       [ 0.02208519,  0.14443609,  0.11331574,  0.15090141],
>>       [ 0.00830154,  0.06009464,  0.05424912,  0.11603281],
>>       [ 0.05205531,  0.0792505 ,  0.02387006,  0.0061777 ],
>>       [ 0.00526626,  0.08439299,  0.17298407,  0.09992403],
>>       [ 0.16510456,  0.07008839,  0.01962196,  0.07101189],
>>       [ 0.23265325,  0.15908956,  0.2072021 ,  0.08105922],
>>       [ 0.19429818,  0.06326201,  0.13702153,  0.03937134]])
>>
>>>>> stats.entropy(rr)
>> array([ 1.9678332 ,  2.19817097,  2.0136922 ,  2.1379255 ])
>>
>>>>> -(rr[idx]*np.log(rr[idx])).sum(0)
>> 8.3176218626994789
>>>>> stats.entropy(rr).sum()
>> 8.3176218626994789
>>
>> Josef
>>
>>>
>>>>
>>>> buggy: if qk is given, then the function isn't vectorized.
>>>>
>>>> Josef
>>>>
>>>>>
>>>>> - N
>>>>> _______________________________________________
>>>>> SciPy-Dev mailing list
>>>>> SciPy-Dev at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev



More information about the SciPy-Dev mailing list