[Numpy-discussion] how to delete the duplicated data in numpy?

Mon Apr 6 23:05:14 EDT 2009

On Mon, Apr 6, 2009 at 9:46 PM, frank wang <f.yw at hotmail.com> wrote:
>
>
> Hi,
>
> I have just noticed that I did not change the title of my question. So I
> resend it out. Sorry for the mistake. Here is my question.
>
> I have a big 2 column data file where the data are repeated either 5 or 6
> times. Are there any quick way to remove the duplicated data?
>
> Thanks
>

I'm still trying to figure out how to work with rows. The following is
based on an answer that I got when I asked for sortrows. However, the
last reshape shouldn't be necessary or I don't know why it is.

See if the following works for you

>>> xx2
array([[ 0.,  1.],
       [ 0.,  1.],
       [ 0.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 2.,  1.],
       [ 2.,  1.],
       [ 2.,  1.]])

>>> xv = xx2.view([('',xx2.dtype)]*xx2.shape[1])
>>> xu = np.unique(xv)
>>> xu.view('f8')
array([ 0.,  1.,  1.,  1.,  2.,  1.])
>>> xu.view('f8').reshape(-1,2)
array([[ 0.,  1.],
       [ 1.,  1.],
       [ 2.,  1.]])

or in one line

>>> np.unique(xx2.view([('',xx2.dtype)]*xx2.shape[1])).view('f8').reshape(-1,xx2.shape[1])
array([[ 0.,  1.],
       [ 1.,  1.],
       [ 2.,  1.]])

The idea is the the structured array keeps the rows together and it
can be fed to unique, which works on 1d, but I have no idea about the
internal structure, but it seems to work.

Josef