[Numpy-discussion] string arrays - accessing data from C++

Christopher Barker Chris.Barker at noaa.gov
Fri Sep 18 16:26:40 EDT 2009


Jaroslav Hajek wrote:
>>> string lengths determined
>> c-style null termination
>>
> 
> Hmm, this didn't seem to work for me. But maybe I was doing something
> else wrong. Thanks.

well, I notice that for a length-n string, if there are n "real' 
characters, then there is no null, so that may have messed up your code 
somewhere.


>> a = np.empty((3,4), dtype=np.character)

> Are you sure? I think this is what I tried (I can't check at this
> moment), and the result has descr->type equal to PyArray_STRING. Also,
> note that even in the interpreter, the dtype shows itself as string:
> 
>>>> numpy.dtype('c')
> dtype('|S1')

Good point -- that is a length-one string, not the same thing. Running:

for n in dir(np):
    if type(getattr(np, n)) == type(np.uint8): print n

give me what should be all the dtype objects, and these are the ones 
that look to me like they might be "char":

byte
character
chararray
int8
ubyte
uint8

but none of those seem to be quite right:

In [20]: for dtype in [np.byte, np.character, np.chararray, np.int8, 
np.ubyte, np.uint8]:
    ....:     a = np.empty((1,1), dtype=dtype); print a.dtype
    ....:
    ....:
int8
|S1
object
int8
uint8
uint8

There was a discussion on the Cython list recently, and apparently 
"char" is not as clearly defined as I thought -- some compilers use 
signed, some unsigned.. who knew? So I'm not sure what PyArray_CHAR is.

I'm sure someone more familiar with the C side of things can answer 
this, though.

Anyone?


> Even null-padded, apparently.

let's see:

In [24]: a = np.array(['this','that','the other'])

In [25]: a.view(np.uint8).reshape((3,-1))
Out[25]:
array([[116, 104, 105, 115,   0,   0,   0,   0,   0],
        [116, 104,  97, 116,   0,   0,   0,   0,   0],
        [116, 104, 101,  32, 111, 116, 104, 101, 114]], dtype=uint8)

In [26]: a[2] = 's'

In [27]: a
Out[27]:
array(['this', 'that', 's'],
       dtype='|S9')

In [28]: a.view(np.uint8).reshape((3,-1))
Out[28]:
array([[116, 104, 105, 115,   0,   0,   0,   0,   0],
        [116, 104,  97, 116,   0,   0,   0,   0,   0],
        [115,   0,   0,   0,   0,   0,   0,   0,   0]], dtype=uint8)


yup-- it looks like the padding is maintained


-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list