[Numpy-discussion] String type again.

Chris Barker chris.barker at noaa.gov
Fri Jul 18 12:15:35 EDT 2014


On Fri, Jul 18, 2014 at 3:33 AM, Nathaniel Smith <njs at pobox.com> wrote:

> > 2) A bytes types -- almost the current 'S' type
> >     - A bytes type would map to/from py3 bytes objects (and py2 bytes
> > objects, which are the same as py2strings)
> >     - one way is would differ from a py2str is that there would be no
> > assumption of null-termination (not sure where that is now)
>
> AFAICT this is *exactly* the same as the current 'S' type. What
> differences do you see?


as you mention it, it is the same on py3, except maybe handling of null
bytes -- you mentioned that you had to do some work-arounds for that. a
proper bytes type would do nothing special with null bytes.


> > 3) A one-byte-per-char text type -- more or less Chuck's current
> proposal.
> >    - it would map to/from the py3 string -- it is text after all
> >    - it would be null-terminated
>
> Numpy strings types are never null-terminated ATM. They're
> null-padded, which is slightly different. When storing data in an S5,
> for instance, strings of length 5 have no nulls appending, strings of
> length 4 have 1 null appended, strings of length 3 have 2 nulls
> appended, etc. When reading data out of an S5, then all trailing nulls
> are stripped.
>
> So, they may not be null terminated (if the length of the string
> exactly matches the length of the dtype), and the strings being stored
> can contain internal nulls ("foo\x00bar" is fine), but they cannot
> contain trailing nulls ("foo\x00" will come back as just "foo").
>
> Do you actually care about null-termination specifically? Or did you
> just mean "it should work like the other ones, which I vaguely
> remember involves nulls"? ;-)
>

That's pretty much what I meant, yes ;-) But the key is that when pushing
one of these things to a python string, any thing after a null byte is
ignored. Which is why you can't use it for arbitrary bytes.

>    - it would have a one-byte per-char encoding: ascii, latin-1 or
> settable
> > (TBA)
>
> Settable is technically very difficult until we redo the dtype
> machinery to allow parametrized types.


indeed -- we have that a bit with Datetime -- but that's a whole other
kettle of fish.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140718/a49ef68f/attachment.html>


More information about the NumPy-Discussion mailing list