[issue18814] Add tools for "cleaning" surrogate escaped strings

Antoine Pitrou report at bugs.python.org
Mon Aug 25 20:55:02 CEST 2014


Antoine Pitrou added the comment:

>    data.encode('utf-8', 'replace').decode('utf-8')
>    data.encode('utf-8', 'ignore').decode('utf-8')

Why not the reverse:

os.fsencode(data).decode('utf-8', 'replace')
os.fsencode(data).decode('utf-8', 'ignore')

Note that "backslashreplace" needs to be enhanced to work when decoding too.
Note that "xmlcharrefreplace" doesn't make sense here: it encodes a *character* reference, but you're precisely trying to represent something which fails interpreting as a character.

(AFAIK, XML can't represent non-text data, except in NDATA sequences)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________


More information about the Python-bugs-list mailing list