[Python-ideas] Processing surrogates in

Stephen J. Turnbull stephen at xemacs.org
Mon May 4 23:21:30 CEST 2015


Serhiy Storchaka writes:

 > In issue18814 proposed several functions to work with surrogate and
 > astral characters. All these functions takes a string and returns a
 > string.

What's the use case?  As far as I can see, in recent Python 3 PEP 393
is implemented, so non-BMP characters are represented as themselves,
not as surrogate pairs.  In a PEP 393-enabled Python, the only
surrogates should be those due to surrogateescape error handling on
input, and chr().  If you don't like the former, be careful about your
use of surrogateescape, and the latter is clearly a "consenting
adults" issue.

Also, you mention that such surrogate characters can be received as
input, which is true, but the standard codecs should already be
treating those as errors.

So as far as I can see, the existing codecs and error handlers already
can deal with any case I might run into in practice.


More information about the Python-ideas mailing list