[SciPy-User] zscore axis functionality is borked

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Dec 17 12:08:22 EST 2011


On Sat, Dec 17, 2011 at 11:20 AM, Warren Weckesser
<warren.weckesser at enthought.com> wrote:
>
>
> On Wed, Nov 30, 2011 at 3:25 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Nov 30, 2011 at 4:10 PM, Warren Weckesser
>> <warren.weckesser at enthought.com> wrote:
>> >
>> >
>> > On Wed, Nov 30, 2011 at 3:05 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Wed, Nov 30, 2011 at 4:02 PM, Warren Weckesser
>> >> <warren.weckesser at enthought.com> wrote:
>> >> >
>> >> >
>> >> > On Wed, Nov 30, 2011 at 2:54 PM, <josef.pktd at gmail.com> wrote:
>> >> >>
>> >> >> On Wed, Nov 30, 2011 at 3:45 PM,  <josef.pktd at gmail.com> wrote:
>> >> >> > On Wed, Nov 30, 2011 at 3:25 PM, Alacast <alacast at gmail.com>
>> >> >> > wrote:
>> >> >> >> axis=0 (the default) works fine. axis=1, etc, is clearly wrong.
>> >> >> >> Am I
>> >> >> >> misunderstanding how to use this, or is this a bug?
>> >> >> >>
>> >> >> >> In [16]: i = rand(4,4)
>> >> >> >>
>> >> >> >> In [17]: i
>> >> >> >> Out[17]:
>> >> >> >> array([[ 0.85367762,  0.25348857,  0.23572615,  0.50403358],
>> >> >> >>        [ 0.70199066,  0.81872151,  0.47357357,  0.20425537],
>> >> >> >>        [ 0.31042673,  0.25837984,  0.73550134,  0.57970176],
>> >> >> >>        [ 0.42828877,  0.60988596,  0.04059321,  0.73944219]])
>> >> >> >>
>> >> >> >> In [18]: zscore(i, axis=0)
>> >> >> >> Out[18]:
>> >> >> >> array([[ 1.30128758, -0.96195723, -0.52119142, -0.01453907],
>> >> >> >>        [ 0.59653471,  1.38544585,  0.39284654, -1.55756529],
>> >> >> >>        [-1.22271057, -0.94164388,  1.39942427,  0.37494213],
>> >> >> >>        [-0.67511172,  0.51815526, -1.27107939,  1.19716222]])
>> >> >> >>
>> >> >> >> In [19]: zscore(i[:,0])
>> >> >> >> Out[19]: array([ 1.30128758,  0.59653471, -1.22271057,
>> >> >> >> -0.67511172])
>> >> >> >>
>> >> >> >> In [20]: zscore(i[:,0])==zscore(i,axis=0)[:,0]
>> >> >> >> Out[20]: array([ True,  True,  True,  True], dtype=bool)
>> >> >> >>
>> >> >> >> In [21]: zscore(i, axis=1)
>> >> >> >> Out[21]:
>> >> >> >> array([[-0.99378502, -1.59397407, -1.61173649, -1.34342906],
>> >> >> >>        [-1.6379836 , -1.52125275, -1.86640069, -2.13571889],
>> >> >> >>        [-2.09968257, -2.15172946, -1.67460796, -1.83040754],
>> >> >> >>        [-1.29796925, -1.11637205, -1.68566481, -0.98681582]])
>> >> >> >> #The above is obviously wrong, as everything has a negative z
>> >> >> >> score
>> >> >> >>
>> >> >> >> In [22]: zscore(i[0,:])
>> >> >> >> Out[22]: array([ 1.56824016, -0.83321371, -0.90428403,
>> >> >> >>  0.16925757])
>> >> >> >>
>> >> >> >> In [23]: zscore(i[0,:])==zscore(i,axis=1)[0,:]
>> >> >> >> Out[23]: array([False, False, False, False], dtype=bool)
>> >> >> >> #Using axis=1 produces different results from taking a row
>> >> >> >> directly.
>> >> >> >>
>> >> >> >> In [24]: zscore(i, axis=-1)
>> >> >> >> Out[24]:
>> >> >> >> array([[-0.99378502, -1.59397407, -1.61173649, -1.34342906],
>> >> >> >>        [-1.6379836 , -1.52125275, -1.86640069, -2.13571889],
>> >> >> >>        [-2.09968257, -2.15172946, -1.67460796, -1.83040754],
>> >> >> >>        [-1.29796925, -1.11637205, -1.68566481, -0.98681582]])
>> >> >> >> #Getting rows by using axis=-1 is no better (this is the same
>> >> >> >> result
>> >> >> >> as
>> >> >> >> axis=1
>> >> >> >
>> >> >> > This looks like a serious bug to me. I don't know what happened
>> >> >> > here
>> >> >> > (.
>> >> >> >
>> >> >> > The docstring example also has negative numbers only.
>> >> >> >
>> >> >> > ???
>> >> >> >
>> >> >> > I'm looking into it
>> >> >> >
>> >> >> > Thanks for reporting
>> >> >>
>> >> >> a misplaced axis: if axis>0
>> >> >> then it calculates   x - mean/std instead of (x - mean) / std
>> >> >>
>> >> >> now, how did this go through the testing ?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > There is only one test for zscore, on a 1-d sample without the axis
>> >> > keyword.
>> >>
>> >> which just show that we shouldn't trust changesets that say
>> >>
>> >> "stats: rewrite of zscore functions, ticket:1083 regression tests
>> >> pass, still need tests for enhancements"
>> >>
>> >> http://projects.scipy.org/scipy/changeset/6169
>> >>
>> >> my mistake  (maybe January 2nd wasn't a good day.)
>> >>
>> >> Josef
>> >>
>> >
>> >
>> > Thanks for the link.  Looks like zmap has the same bug. :(
>>
>> copy paste errors?
>>
>> I just don't know why I didn't do basic checks like this in the final
>> version
>>
>> >>> assert_equal(zscore(x.T, axis=0).T, zscore(x, axis=1))
>> >>> a = zscore(x, axis=1)
>> >>> a.var(1)
>> array([ 1.,  1.,  1.,  1.])
>> >>> a.mean(1)
>> array([  0.00000000e+00,  -1.11022302e-16,   0.00000000e+00,
>>         1.94289029e-16])
>>
>> Josef
>>
>
>
> Ticket: http://projects.scipy.org/scipy/ticket/1575
> Pull request: https://github.com/scipy/scipy/pull/116

Thanks Warren, good to see you and Ralf taking care of stats.

Josef

>
> Warren
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list