[issue18814] Add codecs.convert_surrogateescape to "clean" surrogate escaped strings
Stephen J. Turnbull
report at bugs.python.org
Sat May 9 09:53:04 CEST 2015
Stephen J. Turnbull added the comment:
Please do not add the "rehandle" functions to codecs. They do not change the (duck-typed) representation of data while maintaining the semantics, they change the semantics of data while retaining the representation.
I suggest a "validation" submodule of the unicodedata package, or perhaps a new "unicodeutils" package, for these functions, as well as those that just detect the surrogates, etc.
Because they change the semantics of data they should be documented as potentially dangerous because they can't be inverted back to bytes without knowledge of the history of transformations they perform (and not even then in the case of the "replace" error handler). This matters in applications where the input bytes may have been digitally signed, for example.
----------
nosy: +sjt
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
More information about the Python-bugs-list
mailing list