scipy.stats.itemfreq: overflow with add.reduce

Hans Georg Krauthaeuser hgk at et.uni-magdeburg.de
Thu Dec 22 02:45:33 EST 2005


Hans Georg Krauthaeuser schrieb:
> Hans Georg Krauthaeuser schrieb:
> 
>> Hi All,
>>
>> I was playing with scipy.stats.itemfreq when I observed the following 
>> overflow:
>>
>> In [119]:for i in [254,255,256,257,258]:
>>    .....:    l=[0]*i
>>    .....:    print i, stats.itemfreq(l), l.count(0)
>>    .....:
>> 254 [ [  0 254]] 254
>> 255 [ [  0 255]] 255
>> 256 [ [0 0]] 256
>> 257 [ [0 1]] 257
>> 258 [ [0 2]] 258
>>
>> itemfreq is pretty small (in stats.py):
>>
>> ----------------------------------------------------------------------
>> def itemfreq(a):
>>     """
>> Returns a 2D array of item frequencies.  Column 1 contains item values,
>> column 2 contains their respective counts.  Assumes a 1D array is passed.
>>
>> Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
>> """
>>     scores = _support.unique(a)
>>     scores = sort(scores)
>>     freq = zeros(len(scores))
>>     for i in range(len(scores)):
>>         freq[i] = add.reduce(equal(a,scores[i]))
>>     return array(_support.abut(scores, freq))
>> ----------------------------------------------------------------------
>>
>> It seems that add.reduce is the source for the overflow:
>>
>> In [116]:from scipy import *
>>
>> In [117]:for i in [254,255,256,257,258]:
>>    .....:    l=[0]*i
>>    .....:    print i, add.reduce(equal(l,0))
>>    .....:
>> 254 254
>> 255 255
>> 256 0
>> 257 1
>> 258 2
>>
>> Is there any possibility to avoid the overflow?
>>
>> BTW:
>> Python 2.3.5 (#2, Aug 30 2005, 15:50:26)
>> [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2
>>
>> scipy_version.scipy_version  --> '0.3.2'
>>
>>
>> Thanks and best regards
>> Hans Georg Krauthäuser
> 
> After some further investigation:
> 
> In [150]:add.reduce(array(equal([0]*256,0),typecode='l'))
> Out[150]:256
> 
> In [151]:add.reduce(equal([0]*256,0))
> Out[151]:0
> 
> The problem occurs with arrays with typecode 'b' (as returned by equal).
> 
> Workaround patch for itemfreq is obvious, but ... is it a bug or a feature?
> 
> regards
> Hans Georg

I feel a bit lonely here, but, nevertheless a further remark:

The problem comes directly from the ufunc 'add' for typecode 'b'. In 
contrast to 'multiply' the typecode is not 'upcasted':

In [178]:array(array([1],'b')*2)
Out[178]:array([2],'i')

In [179]:array(array([1],'b')+array([1],'b'))
Out[179]:array([2],'b')

So, for a array a with typecode 'b' it follows that

a+a != a*2

At the moment, I don't have the time to try the new scipy_core. It would 
be nice to hear whether the problem is known or even already fixed!?

Regards
Hans Georg Krauthäuser



More information about the Python-list mailing list