[Numpy-discussion] String type again.
Chris Barker
chris.barker at noaa.gov
Fri Jul 18 12:15:35 EDT 2014
On Fri, Jul 18, 2014 at 3:33 AM, Nathaniel Smith <njs at pobox.com> wrote:
> > 2) A bytes types -- almost the current 'S' type
> > - A bytes type would map to/from py3 bytes objects (and py2 bytes
> > objects, which are the same as py2strings)
> > - one way is would differ from a py2str is that there would be no
> > assumption of null-termination (not sure where that is now)
>
> AFAICT this is *exactly* the same as the current 'S' type. What
> differences do you see?
as you mention it, it is the same on py3, except maybe handling of null
bytes -- you mentioned that you had to do some work-arounds for that. a
proper bytes type would do nothing special with null bytes.
> > 3) A one-byte-per-char text type -- more or less Chuck's current
> proposal.
> > - it would map to/from the py3 string -- it is text after all
> > - it would be null-terminated
>
> Numpy strings types are never null-terminated ATM. They're
> null-padded, which is slightly different. When storing data in an S5,
> for instance, strings of length 5 have no nulls appending, strings of
> length 4 have 1 null appended, strings of length 3 have 2 nulls
> appended, etc. When reading data out of an S5, then all trailing nulls
> are stripped.
>
> So, they may not be null terminated (if the length of the string
> exactly matches the length of the dtype), and the strings being stored
> can contain internal nulls ("foo\x00bar" is fine), but they cannot
> contain trailing nulls ("foo\x00" will come back as just "foo").
>
> Do you actually care about null-termination specifically? Or did you
> just mean "it should work like the other ones, which I vaguely
> remember involves nulls"? ;-)
>
That's pretty much what I meant, yes ;-) But the key is that when pushing
one of these things to a python string, any thing after a null byte is
ignored. Which is why you can't use it for arbitrary bytes.
> - it would have a one-byte per-char encoding: ascii, latin-1 or
> settable
> > (TBA)
>
> Settable is technically very difficult until we redo the dtype
> machinery to allow parametrized types.
indeed -- we have that a bit with Datetime -- but that's a whole other
kettle of fish.
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140718/a49ef68f/attachment.html>
More information about the NumPy-Discussion
mailing list