[Numpy-discussion] how to delete the duplicated data in numpy?
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Apr 6 23:05:14 EDT 2009
On Mon, Apr 6, 2009 at 9:46 PM, frank wang <f.yw at hotmail.com> wrote:
>
>
> Hi,
>
> I have just noticed that I did not change the title of my question. So I
> resend it out. Sorry for the mistake. Here is my question.
>
> I have a big 2 column data file where the data are repeated either 5 or 6
> times. Are there any quick way to remove the duplicated data?
>
> Thanks
>
I'm still trying to figure out how to work with rows. The following is
based on an answer that I got when I asked for sortrows. However, the
last reshape shouldn't be necessary or I don't know why it is.
See if the following works for you
>>> xx2
array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 2., 1.],
[ 2., 1.],
[ 2., 1.]])
>>> xv = xx2.view([('',xx2.dtype)]*xx2.shape[1])
>>> xu = np.unique(xv)
>>> xu.view('f8')
array([ 0., 1., 1., 1., 2., 1.])
>>> xu.view('f8').reshape(-1,2)
array([[ 0., 1.],
[ 1., 1.],
[ 2., 1.]])
or in one line
>>> np.unique(xx2.view([('',xx2.dtype)]*xx2.shape[1])).view('f8').reshape(-1,xx2.shape[1])
array([[ 0., 1.],
[ 1., 1.],
[ 2., 1.]])
The idea is the the structured array keeps the rows together and it
can be fed to unique, which works on 1d, but I have no idea about the
internal structure, but it seems to work.
Josef
More information about the NumPy-Discussion
mailing list