[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Mar 19 14:04:17 EDT 2009


On Thu, Mar 19, 2009 at 1:13 PM, Sturla Molden <sturla at molden.no> wrote:
> On 3/19/2009 6:03 PM, josef.pktd at gmail.com wrote:
>
>> I only checked your python version, and all numbers agree with mine
>> and so can be considered as verified with spss and R (for tau-b).
>
>  > I will open an enhancement ticket for your contingency table version
>  > and the extension of the current stats.kendalltau.
>
> You don't have to. I just did (ticket #893).
>
> I have added Cython versions of kendalltau and kendalltau_fromct. They
> are more than ten times faster than the Python version I posted.
>
> It needs review and decision. I get numbers that look correct to me.
>
> http://projects.scipy.org/scipy/ticket/893
>

Thank you, I will look at it. Since we needed the conversion between
flat and 2d contingency table, I cleaned it up a bit. If you find it
useful, we could also include it, it handles only 2d tables, not
higher dimensional, but allows for arbitrary category labels. (I
reversed x and y from your original version of table2flat.

Josef

def flat2table(x, y, count):
    '''convert flat contingency table to 2d

    Parameters
    ----------
    x, y: 1d arrays
        flattened categories of observation
        x is row variable
        y is column variable
    count: 1d array of cells content

    Returns
    -------
    ---------
    ctable: 2d array,
        contingency table
    xcat, ycat: 1d arrays
        labels of x and y categories

    Examples
    --------
    >>> x = np.array([1,1,1,2,2,2])
    >>> y = np.array([1,2,3,1,2,3])
    >>> count = np.array([10, 5, 2, 9, 12, 16])

    >>> flat2table(x, y, count)
    [0 0 0 1 1 1]
    [0 1 2 0 1 2]
    (array([[ 10.,   5.,   2.],
           [  9.,  12.,  16.]]), array([1, 2]), array([1, 2, 3]))

    >>> flat2table(x.astype(str), y.astype(str), count)
    [0 0 0 1 1 1]
    [0 1 2 0 1 2]
    (array([[ 10.,   5.,   2.],
           [  9.,  12.,  16.]]), array(['1', '2'],
          dtype='|S1'), array(['1', '2', '3'],
          dtype='|S1'))
    '''
    catx, xrinv = np.unique1d(x, return_inverse=True)
    caty, yrinv = np.unique1d(y, return_inverse=True)
    ncatx = len(catx)
    ncaty = len(caty)
    tab = np.zeros((ncatx, ncaty))
    tab[xrinv,yrinv] = count
    return tab, catx, caty


def table2flat(ctable, xcat=None, ycat=None):
    '''convert contingency table to flat format

    Parmeters
    ---------
    ctable: 2d array,
        contingency table
    xcat, ycat: 1d arrays
        labels of x and y categories

    Returns
    -------
    x, y: 1d arrays
        flattened categories of observation
        x is row variable
        y is column variable
    count: 1d array of cells content


    Examples
    --------
    >>> tab = np.array([[ 10.,   5.,   2.], [  9.,  12.,  16.]])
    >>> table2flat(tab)
    (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]), array([
10.,   5.,   2.,   9.,  12.,  16.]))
    >>> table2flat(tab,np.arange(1,nx+1).astype(str),
np.arange(1,ny+1).astype(str))
    (array(['1', '1', '1', '2', '2', '2'],
          dtype='|S1'), array(['1', '2', '3', '1', '2', '3'],
          dtype='|S1'), array([ 10.,   5.,   2.,   9.,  12.,  16.]))

    '''
    assert(ctable.ndim == 2)
    nx = ctable.shape[0]
    ny = ctable.shape[1]
    if xcat is None:
        xcat = range(nx)
    if ycat is None:
        ycat = range(ny)
    y, x = np.meshgrid(ycat, xcat)
    x = x.flatten()
    y = y.flatten()
    count = ctable.flatten()
    return x, y, count

# example and round trip test
x = np.array([1,1,1,2,2,2])
y = np.array([1,2,3,1,2,3])
count = np.array([10, 5, 2, 9, 12, 16])
tab = flat2table(x, y, count)[0]
nx, ny = tab.shape
assert_array_equal((x-1,y-1,count),table2flat(flat2table(x, y, count)[0]))
assert_array_equal((x,y,count),
    table2flat(flat2table(x, y, count)[0],range(1,nx+1), range(1,ny+1)))



More information about the SciPy-Dev mailing list