[Python-ideas] Processing surrogates in

Sat May 16 06:26:19 CEST 2015

random832 at fastmail.us writes:

 > My point was that if you want the benefits of using libc you have
 > to pay the costs of using libc, and that means using libc's native
 > encodings.

Of course it doesn't mean any such thing.  My point was that there are
many utility functions in libc and out that don't care at all that the
array of bytes is encoded text, only that its content not contain
NULs, and that it be NUL-terminated.

Sure, nowadays there are better alternatives for handling text as text
(for example, Python 3 str! -- whose design *nobody* is proposing to
change here, although in the past some have asked that it be turned
into something Unicode compatible), but at least on POSIX systems
the traditional utilities still assume those classic characteristics,
which UTF-8 satisfies and UTF-16 does not.  Incompatibility with those
utilities is an issue for UTF-16, but not for UTF-8.  That's all.