[Python-Dev] Bytes path related questions for Guido

Fri Aug 29 12:09:54 CEST 2014

On 28 Aug 2014, at 19:54, Glenn Linderman wrote:

> On 8/28/2014 10:41 AM, R. David Murray wrote:
>> On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman 
>> <v+python at g.nevcal.com> wrote:
>> [...]
>> Also for
>> cases where the data stream is *supposed* to be in a given encoding, 
>> but
>> contains undecodable bytes.  Showing the stuff that incorrectly 
>> decodes
>> as whatever it decodes to is generally what you want in that case.
> Sure, people can learn to recognize mojibake for what it is, and maybe 
> even learn to recognize it for what it was intended to be, in limited 
> domains. But suppressing/replacing the surrogates doesn't help with 
> that... would it not be better to replace the surrogates with an 
> escape sequence that shows the original, undecodable, byte value?  
> Like  \xNN ?

For that we could extend the "backslashreplace" codec error callback, so 
that it can be used for decoding too, not just for encoding. I.e.

    b"a\xffb".decode("utf-8", "backslashreplace")

would return

    "a\\xffb"

Servus,
    Walter