PEP 393 vs UTF-8 Everywhere

Marko Rauhamaa marko at pacujo.net
Sun Jan 22 03:13:10 EST 2017


eryk sun <eryksun at gmail.com>:

> On Sat, Jan 21, 2017 at 8:21 PM, Pete Forman <petef4+usenet at gmail.com> wrote:
>> Marko Rauhamaa <marko at pacujo.net> writes:
>>
>>>> py> low = '\uDC37'
>>>
>>> That should raise a SyntaxError exception.
>>
>> Quite. [...]
>
> CPython allows surrogate codes for use with the "surrogateescape" and
> "surrogatepass" error handlers, which are used for POSIX and Windows
> file-system encoding, respectively.

Yes, but at the cost of violating Unicode, leading to unprintable
strings etc. In my opinion, Python should have "stayed pure" instead of
playing cheap tricks with surrogates.

(Of course, Unicode itself is a mess, but that's another story.)


Marko



More information about the Python-list mailing list