[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings

Nick Coghlan report at bugs.python.org
Mon Mar 16 12:00:12 CET 2015


Nick Coghlan added the comment:

(Serhiy, did you miss uploading the new patch?)

Regarding the names, we may need to think about the use cases a bit more explicitly to clarify that in terms of the Python codecs API rather than expecting folks to understand the underlying representation. In the case of handling lone surrogates and escaped surrogates, what about:

    rehandle_surrogatepass(data, errors="strict")
    rehandle_surrogateescape(data, errors="strict")

That is, we know we have data that was decoded with either surrogatepass or surrogateespace (respectively) as the error handler, and we want to process the results of that with a different error handler.

I believe those two would be enough to address the specific cases this issue was raised to cover, so it may make sense to file a separate issue to discuss the use cases for the custom astral handling.

Since astrals aren't actually errors in the first place, that could become:

    handle_astrals(data, errors="strict")

As in "pass every astral code point in this string through the named error handler".

The astral -> surrogate pair and surrogate pair -> astral converters do sound potentially interesting, but as noted above, I think they may call for a separate issue that better explains the specific use cases.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________


More information about the Python-bugs-list mailing list