[Numpy-discussion] String type again.
Chris Barker
chris.barker at noaa.gov
Tue Jul 15 16:45:41 EDT 2014
On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:
> Just wondering, couldn't we have a type which actually has an
> (arbitrary, python supported) encoding (and "bytes" might even just be a
> special case of no encoding)?
well, then we're back to the core issue here:
numpy dtypes need to be a pre-specified length
encoded bytes are an arbitrary length.
This leads us to wanting to use only fixed-number-of-bytes-per-character
encodings:
- ascii
- latin-a
- UCS-4 (or UTF-32..I get a bit confused about the names)
maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise
between space and fraction of unicode supported.
Basically storing bytes and on access do
> element[i].decode(specified_encoding) and on storing element[i] =
> value.encode(specified_encoding).
>
this really doesn't seem that different than just using python strings --
is there a point to having a pointer-to-python-string type as a less
generalized version of the currently possible python strings in object
arrays?
There is always the never ending small issue of trailing null bytes. If
> we want to be fully compatible, such a type would have to store the
> string length explicitly to support trailing null bytes.
>
are null bytes legal (as something other than a terminator) in some
encodings?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140715/166096a3/attachment.html>
More information about the NumPy-Discussion
mailing list