[Numpy-discussion] unique rows of array

Maria Liukis liukis at usc.edu
Tue Aug 18 00:59:40 EDT 2009


Josef,

Thanks, I'll try that and will search for your question from last  
december :)
Masha
--------------------
liukis at usc.edu



On Aug 17, 2009, at 9:44 PM, josef.pktd at gmail.com wrote:

> On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukis<liukis at usc.edu> wrote:
>> Hello everybody,
>> While re-implementing some Matlab code in Python, I've run into a  
>> problem of
>> finding a NumPy function analogous to the Matlab's "unique(array,  
>> 'rows')"
>> to get unique rows of an array. Searching the web, I've found a  
>> similar
>> discussion from couple of years ago with an example:
>>
>> ############## A SNIPPET FROM THE DISCUSSION
>> [Numpy-discussion] Finding unique rows in an array [Was: Finding a  
>> row match
>> within a numpy array]
>> A Tuesday 21 August 2007, Mark.Miller escrigué:
>>> A slightly related question on this topic...
>>>
>>> Is there a good loopless way to identify all of the unique rows  
>>> in an
>>> array?  Something like numpy.unique() is ideal, but capable of
>>> extracting unique subarrays along an axis.
>> You can always do a view of the rows as strings and then use unique 
>> ().
>> Here is an example:
>> In [1]: import numpy
>> In [2]: a=numpy.arange(12).reshape(4,3)
>> In [3]: a[2]=(3,4,5)
>> In [4]: a
>> Out[4]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 3,  4,  5],
>>        [ 9, 10, 11]])
>> now, create the view and select the unique rows:
>> In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view 
>> ('i4')
>> and finally restore the shape:
>> In [6]: b.reshape((len(b)/a.shape[1], a.shape[1]))
>> Out[6]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 9, 10, 11]])
>> If you want to find unique columns instead of rows, do a tranpose  
>> first
>> on the initial array.
>> ################END OF DISCUSSION
>>
>> Provided example works only because array elements are row-sorted.
>> Changing tested array to (in my case, it's 'c'):
>>>>> c
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 3,  4,  5],
>>        [ 9, 10, 11]])
>>>>> c[0] = (11, 10, 0)
>>>>> c
>> array([[11, 10,  0],
>>        [ 3,  4,  5],
>>        [ 3,  4,  5],
>>        [ 9, 10, 11]])
>>>>> b = np.unique(c.view('S%s' %c.itemsize*c.shape[0]))
>>>>> b
>> array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'],
>>       dtype='|S4')
>>>>> b.view('i4')
>> array([ 0,  3,  4,  5,  9, 10, 11])
>>>>> b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> ValueError: total size of new array must be unchanged
>>>>>
>> Since len(b) = 7.
>> Suggested approach would work if the whole row would be converted  
>> to a
>> single string, I guess. But from what I could gather,  
>> numpy.array.view()
>> only changes display element-wise.
>> Before I start re-inventing the wheel, I was just wondering if using
>> existing numpy functionality one could find unique rows in an array.
>>
>> Many thanks in advance!
>> Masha
>> --------------------
>> liukis at usc.edu
>>
>>
>
> one way is to convert to structured array
>
>>>> c = np.array([[ 0,  1,  2],
>        [ 3,  4,  5],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
>
>>>> np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view 
>>>> (c.dtype).reshape(-1,c.shape[1])
> array([[ 0,  1,  2],
>        [ 3,  4,  5],
>        [ 9, 10, 11]])
>
> for explanation, I asked a similar question last december about  
> "sortrows".
> (I never remember, when I need the last reshape and when not)
>
> Josef
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090817/d94f4d54/attachment.html>


More information about the NumPy-Discussion mailing list