[Numpy-discussion] String type again.

Charles R Harris charlesr.harris at gmail.com
Fri Jul 18 13:39:21 EDT 2014


On Fri, Jul 18, 2014 at 10:59 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Jul 18, 2014 at 5:54 PM, Chris Barker <chris.barker at noaa.gov>
> wrote:
> >
> > This is why I see no downside to latin-1 -- if you don't use the > 127
> code
> > points, it's the same thing -- if you do, you get some extra handy
> > characters. The only difference is that a proper ascii type would not let
> > you store anything above 127 at all -- why restrict ourselves?
>
> IMO the extra characters aren't the most compelling argument for
> latin1 over ascii. Latin1 gives the nice assurance that if some jerk
> *does* give me an "ascii" file that somewhere has some byte with the
> 8th bit set, then I can still load the data and fix things by hand.
> This is trickier if numpy just refuses to touch the data, blowing up
> with an exception when I try. In general it's easy to create numpy
> arrays containing arbitrary bitpatterns, so it's nice to have some
> strategy for what to do with them.
>
>
Just to throw in one more complication, there is no buffer protocol for a
fixed encoding type. In Python 3 'c', 's', 'p' are all considered as bytes,
in Python 2 as strings.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140718/ca36d2c6/attachment.html>


More information about the NumPy-Discussion mailing list