[Numpy-discussion] string arrays - accessing data from C++

Jaroslav Hajek highegg at gmail.com
Mon Sep 21 05:12:34 EDT 2009


On Fri, Sep 18, 2009 at 10:26 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> Jaroslav Hajek wrote:
>>>> string lengths determined
>>> c-style null termination
>>>
>>
>> Hmm, this didn't seem to work for me. But maybe I was doing something
>> else wrong. Thanks.
>
> well, I notice that for a length-n string, if there are n "real'
> characters, then there is no null, so that may have messed up your code
> somewhere.
>

As it happens, the problem was just in my brain :)

>
>>> a = np.empty((3,4), dtype=np.character)
>
>> Are you sure? I think this is what I tried (I can't check at this
>> moment), and the result has descr->type equal to PyArray_STRING. Also,
>> note that even in the interpreter, the dtype shows itself as string:
>>
>>>>> numpy.dtype('c')
>> dtype('|S1')
>
> Good point -- that is a length-one string, not the same thing. Running:
>
> for n in dir(np):
>    if type(getattr(np, n)) == type(np.uint8): print n
>
> give me what should be all the dtype objects, and these are the ones
> that look to me like they might be "char":
>
> byte
> character
> chararray
> int8
> ubyte
> uint8
>
> but none of those seem to be quite right:
>
> In [20]: for dtype in [np.byte, np.character, np.chararray, np.int8,
> np.ubyte, np.uint8]:
>    ....:     a = np.empty((1,1), dtype=dtype); print a.dtype
>    ....:
>    ....:
> int8
> |S1
> object
> int8
> uint8
> uint8
>
> There was a discussion on the Cython list recently, and apparently
> "char" is not as clearly defined as I thought -- some compilers use
> signed, some unsigned.. who knew? So I'm not sure what PyArray_CHAR is.
>

This is what I suspected - there is no longer a true "character array"
type, and dtype("c") is just an alias for dtype("S1").
Similarly, creating a PyArray_CHAR array from the C API results in dtype("|S1").

>
> yup-- it looks like the padding is maintained
>

That's great, because that's almost exactly the data Octave needs.
Only Octave typically uses space as the padding character for
compatibility with Matlab, but can cope with nulls as well.

NumPy string arrays are supported by Pytave now. Thanks for your help.

best regards

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



More information about the NumPy-Discussion mailing list