[Numpy-discussion] Multiplicity of an entry

Christopher Barker Chris.Barker at noaa.gov
Tue Oct 27 12:09:53 EDT 2009


Nadav Horesh wrote:
> np.equal(a,a).sum(0)
> 
> but, for unknown reason, np.equal operates only on "normal" arrays.

true:

In [25]: a
Out[25]:
array(['abc', 'def', 'abc', 'ghij'],
       dtype='|S4')

In [27]: np.equal(a,a)
Out[27]: NotImplemented

however:

In [28]: a == a
Out[28]: array([ True,  True,  True,  True], dtype=bool)

don't they use the same code? or is "==" reverting to plain old generic 
python sequence comparison, which would partly explain why it is so slow.

> maybe you can transform the array to arrays of numbers, for example by hash.

or even easier:

In [32]: a2 = a.view(dtype=np.int32)

In [33]: a2
Out[33]: array([1633837824, 1684366848, 1633837824, 1734895978])

In [34]: np.equal(a2, a2[0])
Out[34]: array([ True, False,  True, False], dtype=bool)

though that only works if your strings are a handy length like 4 bytes...

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list