[Numpy-discussion] proposal: smaller representation of string arrays

Nathaniel Smith njs at pobox.com
Tue Apr 25 20:41:22 EDT 2017


On Tue, Apr 25, 2017 at 4:11 PM, Chris Barker - NOAA Federal
<chris.barker at noaa.gov> wrote:
>> On Apr 25, 2017, at 12:38 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> Eh... First, on Windows and MacOS, filenames are natively Unicode.
>
> Yeah, though once they are stored I. A text file -- who the heck
> knows? That may be simply unsolvable.
>> s. And then from in Python, if you want to actually work with those filenames you need to either have a bytestring type or else a Unicode type that uses surrogateescape to represent the non-ascii characters.
>
>
>> IMO if you have filenames that are arbitrary bytestrings and you need to represent this properly, you should just use bytestrings -- really, they're perfectly friendly :-).
>
> I thought the Python file (and Path) APIs all required (Unicode)
> strings? That was the whole complaint!

No, the path APIs all accept bytestrings (and ones that return
pathnames like listdir return bytestrings if given bytestrings). Or at
least they're supposed to.

The really urgent need for surrogateescape was things like sys.argv
and os.environ where arbitrary bytes might come in (on some systems)
but the API is restricted to strs.

> And no, bytestrings are not perfectly friendly in py3.

I'm not saying you should use them everywhere or that they remove the
need for an ergonomic text dtype, but when you actually want to work
with bytes they're pretty good (esp. in modern py3).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list