[Numpy-discussion] proposal: smaller representation of string arrays

Chris Barker - NOAA Federal chris.barker at noaa.gov
Tue Apr 25 19:11:27 EDT 2017


> On Apr 25, 2017, at 12:38 PM, Nathaniel Smith <njs at pobox.com> wrote:

> Eh... First, on Windows and MacOS, filenames are natively Unicode.

Yeah, though once they are stored I. A text file -- who the heck
knows? That may be simply unsolvable.
> s. And then from in Python, if you want to actually work with those filenames you need to either have a bytestring type or else a Unicode type that uses surrogateescape to represent the non-ascii characters.


> IMO if you have filenames that are arbitrary bytestrings and you need to represent this properly, you should just use bytestrings -- really, they're perfectly friendly :-).

I thought the Python file (and Path) APIs all required (Unicode)
strings? That was the whole complaint!

And no, bytestrings are not perfectly friendly in py3.

This got really complicated and sidetracked, but All I'm suggesting is
that if we have a 1byte per char string type, with a fixed encoding,
that that encoding be Latin-1, rather than ASCII.

That's it, really.

Having a settable encoding would work fine, too.

-CHB


More information about the NumPy-Discussion mailing list