[Numpy-discussion] string arrays - accessing data from C++
Christopher Barker
Chris.Barker at noaa.gov
Fri Sep 18 16:26:40 EDT 2009
Jaroslav Hajek wrote:
>>> string lengths determined
>> c-style null termination
>>
>
> Hmm, this didn't seem to work for me. But maybe I was doing something
> else wrong. Thanks.
well, I notice that for a length-n string, if there are n "real'
characters, then there is no null, so that may have messed up your code
somewhere.
>> a = np.empty((3,4), dtype=np.character)
> Are you sure? I think this is what I tried (I can't check at this
> moment), and the result has descr->type equal to PyArray_STRING. Also,
> note that even in the interpreter, the dtype shows itself as string:
>
>>>> numpy.dtype('c')
> dtype('|S1')
Good point -- that is a length-one string, not the same thing. Running:
for n in dir(np):
if type(getattr(np, n)) == type(np.uint8): print n
give me what should be all the dtype objects, and these are the ones
that look to me like they might be "char":
byte
character
chararray
int8
ubyte
uint8
but none of those seem to be quite right:
In [20]: for dtype in [np.byte, np.character, np.chararray, np.int8,
np.ubyte, np.uint8]:
....: a = np.empty((1,1), dtype=dtype); print a.dtype
....:
....:
int8
|S1
object
int8
uint8
uint8
There was a discussion on the Cython list recently, and apparently
"char" is not as clearly defined as I thought -- some compilers use
signed, some unsigned.. who knew? So I'm not sure what PyArray_CHAR is.
I'm sure someone more familiar with the C side of things can answer
this, though.
Anyone?
> Even null-padded, apparently.
let's see:
In [24]: a = np.array(['this','that','the other'])
In [25]: a.view(np.uint8).reshape((3,-1))
Out[25]:
array([[116, 104, 105, 115, 0, 0, 0, 0, 0],
[116, 104, 97, 116, 0, 0, 0, 0, 0],
[116, 104, 101, 32, 111, 116, 104, 101, 114]], dtype=uint8)
In [26]: a[2] = 's'
In [27]: a
Out[27]:
array(['this', 'that', 's'],
dtype='|S9')
In [28]: a.view(np.uint8).reshape((3,-1))
Out[28]:
array([[116, 104, 105, 115, 0, 0, 0, 0, 0],
[116, 104, 97, 116, 0, 0, 0, 0, 0],
[115, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
yup-- it looks like the padding is maintained
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list