[Numpy-discussion] String type again.

Chris Barker chris.barker at noaa.gov
Tue Jul 15 16:45:41 EDT 2014


On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Just wondering, couldn't we have a type which actually has an
>  (arbitrary, python supported) encoding (and "bytes" might even just be a
> special case of no encoding)?


well, then we're back to the core issue here:

numpy dtypes need to be a pre-specified length

encoded bytes are an arbitrary length.

This leads us to wanting to use only fixed-number-of-bytes-per-character
encodings:
 - ascii
 - latin-a
 - UCS-4 (or UTF-32..I get a bit confused about the names)

maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise
between space and fraction of unicode supported.

Basically storing bytes and on access do
> element[i].decode(specified_encoding) and on storing element[i] =
> value.encode(specified_encoding).
>

this really doesn't seem that different than just using python strings --
is there a point to having a pointer-to-python-string type as a less
generalized version of the currently possible  python strings in object
arrays?

 There is always the never ending small issue of trailing null bytes. If

> we want to be fully compatible, such a type would have to store the
> string length explicitly to support trailing null bytes.
>

are null bytes legal (as something other than a terminator) in some
encodings?

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140715/166096a3/attachment.html>


More information about the NumPy-Discussion mailing list