[Numpy-discussion] New function `count_unique` to generate contingency tables.

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Wed Aug 13 18:17:37 EDT 2014


Its pretty easy to implement this table functionality and more on top of
the code I linked above. I still think such a comprehensive overhaul of
arraysetops is worth discussing.

import numpy as np
import grouping
x = [1, 1, 1, 1, 2, 2, 2, 2, 2]
y = [3, 4, 3, 3, 3, 4, 5, 5, 5]
z = np.random.randint(0,2,(9,2))
def table(*keys):
    """
    desired table implementation, building on the index object
    cleaner, and more functionality
    performance should be the same
    """
    indices  = [grouping.as_index(k, axis=0) for k in keys]
    uniques  = [i.unique  for i in indices]
    inverses = [i.inverse for i in indices]
    shape    = [i.groups  for i in indices]
    t = np.zeros(shape, np.int)
    np.add.at(t, inverses, 1)
    return tuple(uniques), t
#here is how to use
print table(x,y)
#but we can use fancy keys as well; here a composite key and a row-key
print table((x,y), z)
#this effectively creates a sparse matrix equivalent of your desired table
print grouping.count((x,y))


On Wed, Aug 13, 2014 at 11:25 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

>
>
>
> On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>> The ever-wonderful pylab mode in matplotlib has a table function for
>> plotting a table of text in a plot. If I remember correctly, what would
>> happen is that matplotlib's table() function will simply obliterate the
>> numpy's table function. This isn't a show-stopper, I just wanted to point
>> that out.
>>
>> Personally, while I wasn't a particular fan of "count_unique" because I
>> wouldn't necessarially think of it when needing a contingency table, I do
>> like that it is verb-ish. "table()", in this sense, is not a verb. That
>> said, I am perfectly fine with it if you are fine with the name collision
>> in pylab mode.
>>
>>
>
> Thanks for pointing that out.  I only changed it to have something that
> sounded more table-ish, like the Pandas, R and Matlab functions.   I won't
> update it right now, but if there is interest in putting it into numpy,
> I'll rename it to avoid the pylab conflict.  Anything along the lines of
> `crosstab`, `xtable`, etc., would be fine with me.
>
> Warren
>
>
>
>> On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser <
>> warren.weckesser at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn <
>>> hoogendoorn.eelco at gmail.com> wrote:
>>>
>>>> ah yes, that's also an issue I was trying to deal with. the semantics I
>>>> prefer in these type of operators, is (as a default), to have every array
>>>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get
>>>> unique rows, unless you pass axis=None, in which case the array is
>>>> flattened.
>>>>
>>>> I also agree that the extension you propose here is useful; but
>>>> ideally, with a little more discussion on these subjects we can converge on
>>>> an even more comprehensive overhaul
>>>>
>>>>
>>>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington <joferkington at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn <
>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>
>>>>>> Thanks. Prompted by that stackoverflow question, and similar problems
>>>>>> I had to deal with myself, I started working on a much more general
>>>>>> extension to numpy's functionality in this space. Like you noted, things
>>>>>> get a little panda-y, but I think there is a lot of panda's functionality
>>>>>> that could or should be part of the numpy core, a robust set of grouping
>>>>>> operations in particular.
>>>>>>
>>>>>> see pastebin here:
>>>>>> http://pastebin.com/c5WLWPbp
>>>>>>
>>>>>
>>>>> On a side note, this is related to a pull request of mine from awhile
>>>>> back: https://github.com/numpy/numpy/pull/3584
>>>>>
>>>>> There was a lot of disagreement on the mailing list about what to call
>>>>> a "unique slices along a given axis" function, so I wound up closing the
>>>>> pull request pending more discussion.
>>>>>
>>>>> At any rate, I think it's a useful thing to have in "base" numpy.
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> Update: I renamed the function to `table` in the pull request:
>>> https://github.com/numpy/numpy/pull/4958
>>>
>>>
>>> Warren
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140814/085bb781/attachment.html>


More information about the NumPy-Discussion mailing list