[Numpy-discussion] weird searchsorted behavior for unicode array

Thouis (Ray) Jones thouis at gmail.com
Thu Mar 29 05:04:27 EDT 2012


It seems to be a bug in the unicode string length computation in
arraytypes.c.src:UNICODE_compare(), based on comparison to the code in
arrayobject.c:_myunicmp() and arrayobject.c:_compare_strings().

Patch below (against maintenance/1.6.x, but the bug also looks to be
present in master based on my reading of the code).

---
 numpy/core/src/multiarray/arraytypes.c.src |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/numpy/core/src/multiarray/arraytypes.c.src
b/numpy/core/src/multiarray/arraytypes.c.src
index fde95c4..660d1e5 100644
--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -2789,7 +2789,7 @@ static int
 UNICODE_compare(PyArray_UCS4 *ip1, PyArray_UCS4 *ip2,
                 PyArrayObject *ap)
 {
-    int itemsize = ap->descr->elsize;
+    int itemsize = (ap->descr->elsize) >> 2;

     if (itemsize < 0) {
         return 0;
-- 
1.7.9.3



More information about the NumPy-Discussion mailing list