[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings

Stephen J. Turnbull report at bugs.python.org
Sat May 9 09:53:04 CEST 2015


Stephen J. Turnbull added the comment:

Please do not add the "rehandle" functions to codecs.  They do not change the (duck-typed) representation of data while maintaining the semantics, they change the semantics of data while retaining the representation.

I suggest a "validation" submodule of the unicodedata package, or perhaps a new "unicodeutils" package, for these functions, as well as those that just detect the surrogates, etc.

Because they change the semantics of data they should be documented as potentially dangerous because they can't be inverted back to bytes without knowledge of the history of transformations they perform (and not even then in the case of the "replace" error handler).  This matters in applications where the input bytes may have been digitally signed, for example.

----------
nosy: +sjt

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________


More information about the Python-bugs-list mailing list