[SciPy-dev] RFR: Proposed fixes in scipy.stats functions for calculation of variance/error/etc.

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Oct 26 01:13:04 EDT 2009


On Mon, Oct 26, 2009 at 12:59 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, Oct 26, 2009 at 12:19 AM,  <josef.pktd at gmail.com> wrote:
>> On Sun, Oct 25, 2009 at 11:49 PM, Ariel Rokem <arokem at berkeley.edu> wrote:
>>> Hi Josef and all,
>>>
>>> thank for looking. Concerning the z-score functions - I am also
>>> confused by those and I would suggest unifying them under one
>>> function. In particular, I can't imagine what the function 'z' is for.
>>> However, I don't want to just remove these without discussion. What do
>>> you think about this?
>>>
>>> Another, more general thing, concerning the axis - I am wondering: why
>>> is the default axis for scipy is 0, while the default for numpy (in
>>> np.mean, for example) is None? I think that it would be good to have
>>> one convention for both libraries. I think that the more parsimonious
>>> one is the one using "None" as the default value. This doesn't favor
>>> any of the dimensions of an array over others, by default. I don't
>>> know - how wide-spread is this convention within scipy?
>>
>> I had to run after the last message. My impression was that maybe in
>> one of the changes the ddof=1 got lost, i.e. the distinction that was
>> in scipy stats for population versus sample statistics.
>> z and zmap look the same to me from the intended (?) calculation
>> but zmap mixes up the axis arguments. (mean with "axis", std with
>> hardcoded axis=0). Maybe the intention will be clearer when I look
>> at the trac history or the original stats package.
>>
>> From looking at the three function, I would assume that the combined
>> function would have a signature like
>>
>> def zscore(a, compare=None, axis=0, ddof=0)
>>
>> or two functions, one with compare, one without ?
>
> see:
> http://projects.scipy.org/scipy/browser/trunk/Lib/stats/stats.py?rev=2028#L1174
>
> zs was the list version for the zscore using z to calculate, the translation in
> the next changeset is correct only for 1d or raveled arrays, but it is missing
> an axis argument. It looks like z was a helper function for a scalar score.
> zmap got imported in this form in revision 71.
>
> stats.mstats has the same functions, but they look like literal translations
> since they have the same (ambiguous) treatment of axis if it's not 1d.
> stats.mstats.z has ddof=1, the others ddof=0
>
> With broadcasting and adjustment of the dimension of min and std, only
> a single score function seems necessary, the current functions look a bit
> like historical relics.
>
> Josef
>
>>
>>
>> About default axis=0:
>>
>> I think this is scipy.stats specific. We had a brief discussion a year
>> ago, where Jarrod agreed that default for stats should remain axis=0.
>>
>> In statistics, you almost never want to ravel data, not mixing apples
>> and cars, or prices and quantities. So the default should be reducing
>> along an axis, e.g. mean over all observations by variable.
>>
>> axis=0 versus axis=-1, this is traditional in statistics/econometrics. Both
>> from other matrix packages (gauss, matlab) and from the textbook
>> treatment (of books that I know). Switching to -1 for the data would
>> be a big mental break and would require axis translation of the
>> textbook formulas, e.g solve X'X beta = X'Y
>>
>> From my perspective loosing axis=0 as default is the main disadvantage
>> of removing mean, var, and so on, from scipy.stats. eg. I need to create
>> a lambda function if I want mean(x, axis=0) as a callback function.

digging a bit in the history, switch from axis=-1 to axis=0:

"Fixed functions in stats.py to have default axis 0"

http://projects.scipy.org/scipy/changeset/1465

Josef


>>
>> Cheers,
>>
>> Josef
>>
>>>
>>> Cheers,
>>>
>>> Ariel
>>>
>>> On Sun, Oct 25, 2009 at 8:16 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sun, Oct 25, 2009 at 10:50 PM, Ariel Rokem <arokem at berkeley.edu> wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I have been working on some fixes to the functions in scipy.stats
>>>>> which calculate variance/error and related quantities. In particular,
>>>>> in order to comply with the deprecation warnings that appear in use of
>>>>> scipy.stats.samplevar/scipy.stats.samplestd, I have replaced use of
>>>>> these functions with calls to np.std/np.var. I have also cleaned up
>>>>> the documentation a bit.
>>>>>
>>>>> This can all be found here: http://codereview.appspot.com/141051
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ariel
>>>>
>>>> I just gave it a quick look, looks good so far
>>>>
>>>> in  def zs  looks like a shape error for axis>0
>>>> "return (a-mu)/sigma"
>>>>
>>>>
>>>> def zs   changes definition, before it normalized with raveled mean,
>>>> std not by axis
>>>>
>>>> - mu = np.mean(a,None)
>>>> - sigma = samplestd(a)
>>>> - return (array(a)-mu)/sigma
>>>>
>>>> + a,axis = _chk_asarray(a,axis)
>>>> + mu = np.mean(a,axis)
>>>> + sigma = np.std(a,axis)
>>>> + return (a-mu)/sigma
>>>>
>>>> I never looked closely at these,
>>>> zmap has a description I don't understand.
>>>>
>>>> z, zs, zm  ???
>>>>
>>>> Which is which? they look a bit inconsistent, population might refer
>>>> to dof correction in z ?
>>>> Is there a standard terminology for z scores?
>>>>
>>>> I think for axis, I have seen more "int or None" ?
>>>>
>>>> Josef
>>>>
>>>>
>>>>
>>>>
>>>>> --
>>>>> Ariel Rokem
>>>>> Helen Wills Neuroscience Institute
>>>>> University of California, Berkeley
>>>>> http://argentum.ucbso.berkeley.edu/ariel
>>>>> _______________________________________________
>>>>> Scipy-dev mailing list
>>>>> Scipy-dev at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>>
>>>> _______________________________________________
>>>> Scipy-dev mailing list
>>>> Scipy-dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>
>>>
>>>
>>>
>>> --
>>> Ariel Rokem
>>> Helen Wills Neuroscience Institute
>>> University of California, Berkeley
>>> http://argentum.ucbso.berkeley.edu/ariel
>>> _______________________________________________
>>> Scipy-dev mailing list
>>> Scipy-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>
>



More information about the SciPy-Dev mailing list