[Numpy-discussion] String type again.

Fri Jul 18 12:59:32 EDT 2014

On Fri, Jul 18, 2014 at 5:54 PM, Chris Barker <chris.barker at noaa.gov> wrote:
>
> This is why I see no downside to latin-1 -- if you don't use the > 127 code
> points, it's the same thing -- if you do, you get some extra handy
> characters. The only difference is that a proper ascii type would not let
> you store anything above 127 at all -- why restrict ourselves?

IMO the extra characters aren't the most compelling argument for
latin1 over ascii. Latin1 gives the nice assurance that if some jerk
*does* give me an "ascii" file that somewhere has some byte with the
8th bit set, then I can still load the data and fix things by hand.
This is trickier if numpy just refuses to touch the data, blowing up
with an exception when I try. In general it's easy to create numpy
arrays containing arbitrary bitpatterns, so it's nice to have some
strategy for what to do with them.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org