[SciPy-User] What "Array" means

Bruce Southey bsouthey at gmail.com
Tue Apr 12 13:12:55 EDT 2011


On Mon, Apr 11, 2011 at 3:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Fri, Apr 8, 2011 at 5:45 AM,  <josef.pktd at gmail.com> wrote:
>> On Fri, Apr 8, 2011 at 6:14 AM, Timothy Wu <2huggie at gmail.com> wrote:
>>> Hi I am trying to run Scipy's D'Agostino's normality test as documented here
>>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.normaltest.html
>>>
>>> For the array argument I tried something like this
>>> scipy.array([1,2,3])
>>> or
>>> numpy.array([1,2,3])
>>>
>>> and axis ignored.
>>>
>>> But with both method the test fails:
>>>
>>> File "/usr/lib/python2.6/dist-packages/scipy/stats/mstats_basic.py", line
>>> 1546, in kurtosistest
>>>     n = a.count(axis=axis).astype(float)
>>> AttributeError: 'int' object has no attribute 'astype'
>>>
>>> I'm not familiar with numpy nor scipy. What exactly should I put in there?
>>
>>
>> It looks like mstats.normaltest only works with 2-dimensional arrays,
>> stats.normaltest works with 1-dimensional arrays.
>>
>> rvs[:,None]  in the example below adds an additional axis, so that it
>> is a column array with shape (20,1)
>> If you don't need the masked array version, then you can use stats.normaltest
>>
>> I haven't looked at the source yet, but this looks like a bug to me.
>>
>>>>> rvs = np.random.randn(20)
>>>>> rvs
>> array([ 0.02724005, -0.17836266,  0.40530377,  1.313246  ,  0.74069068,
>>       -0.69010129, -0.24958557, -2.28311759,  0.10525733,  0.07986322,
>>       -0.87282545, -1.41364294,  1.16027037,  0.23541801, -0.06663458,
>>        0.39173207,  0.06979893,  0.4400277 , -1.29361117, -1.71524228])
>>>>> stats.normaltest(rvs)
>> (1.7052869564079727, 0.42628656195988301)
>>>>> stats.mstats.normaltest(rvs[:,None])
>> (masked_array(data = [1.70528695641],
>>             mask = [False],
>>       fill_value = 1e+20)
>> , masked_array(data = [ 0.42628656],
>>             mask = False,
>>       fill_value = 1e+20)
>> )
>>>>> stats.mstats.normaltest(rvs)
>>
>> Traceback (most recent call last):
>>  File "<pyshell#58>", line 1, in <module>
>>    stats.mstats.normaltest(rvs)
>>  File "C:\Programs\Python27\lib\site-packages\scipy\stats\mstats_basic.py",
>> line 1642, in normaltest
>>    k,_ = kurtosistest(a,axis)
>>  File "C:\Programs\Python27\lib\site-packages\scipy\stats\mstats_basic.py",
>> line 1618, in kurtosistest
>>    n = a.count(axis=axis).astype(float)
>> AttributeError: 'int' object has no attribute 'astype'
>>
>> Josef
>>
>
> Yes that is a bug so can someone create a ticket? (I don't have time today.)
> That occurs because ma.count() returns either an int (which causes the
> bug) or a ndarray. Actually that '.astype(float)' is probably not
> needed because as far as I can determine that every usage of an 'n' as
> an integer should still results in a float.

This is now ticket 1424 with a patch:
http://projects.scipy.org/scipy/ticket/1424

It did require a second change that I commented because the code needs
to index an array.

>
> There is also a second 'bug' because n must be greater than 3. I was
> looking for that because estimating kurtosis needs more than 3
> observations:
> "This bias-corrected formula requires that X contain at least four elements."
> http://www.mathworks.com/help/toolbox/stats/kurtosis.html
>
> This a different ticket because we need to catch the cases when only
> one particular 'column' has less than 4 but the other are fine.
>
>
>>>> rvs = np.random.randn(20,10)
>>>> stats.mstats.normaltest(rvs, axis=0)
> (masked_array(data = [0.713606808604 0.132722315345 7.78660833457
> 5.38597554393 0.725711290319
>  0.172342343314 4.02320908322 1.46363950653 3.79550214574 0.293759931912],
>             mask = [False False False False False False False False
> False False],
>       fill_value = 1e+20)
> , masked_array(data = [ 0.69991008  0.93579283  0.0203779   0.06767843
>  0.69568685  0.91743718
>  0.13377386  0.48103283  0.14990537  0.86339761],
>             mask = False,
>       fill_value = 1e+20)
> )
>>>> stats.mstats.normaltest(rvs, axis=1)
> (masked_array(data = [0.314582042621 0.4436261479 2.98149400163
> 2.02242070422 3.46138431999
>  9.94304440942 0.026055683609 5.7060731383 1.03808026381 0.169589515995
>  10.5681767508 1.28212296678 3.7013014714 0.43713740004 3.62659584833
>  0.289410600885 1.46353531025 0.745198884215 1.51022347547 0.00707268228071],
>             mask = [False False False False False False False False
> False False False False
>  False False False False False False False False],
>       fill_value = 1e+20)
> , masked_array(data = [ 0.85445536  0.80106509  0.22520436  0.36377841
>  0.17716174  0.00693259
>  0.98705665  0.05766894  0.59509148  0.91870082  0.00507165  0.52673301
>  0.15713488  0.80366827  0.16311531  0.86527725  0.48105789  0.68894114
>  0.4699581   0.9964699 ],
>             mask = False,
>       fill_value = 1e+20)
> )
>>>> stats.mstats.normaltest(rvs, axis=None)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/usr/lib64/python2.7/site-packages/scipy/stats/mstats_basic.py",
> line 1649, in normaltest
>    k,_ = kurtosistest(a,axis)
>  File "/usr/lib64/python2.7/site-packages/scipy/stats/mstats_basic.py",
> line 1625, in kurtosistest
>    n = a.count(axis=axis).astype(float)
> AttributeError: 'int' object has no attribute 'astype'
>>>>
>
> That is because:
>>>> mrvs=rvs.view(ma.MaskedArray)
>>>> type(mrvs)
> <class 'numpy.ma.core.MaskedArray'>
>>>> type(mrvs.count(axis=0))
> <type 'numpy.ndarray'>
>>>> type(mrvs.count(axis=1))
> <type 'numpy.ndarray'>
>>>> type(mrvs.count(axis=None))
> <type 'int'>
>
>
> Bruce
>
This is now ticket 1425 with patches:
http://projects.scipy.org/scipy/ticket/1425

However the patch for mstats_basic.py does need some work. Basically
only those specific cases with less than 4 observations should be 0
not all cases.

Bruce



More information about the SciPy-User mailing list