Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Sat Mar 7 11:17:57 EST 2015


On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
>
>> Marko Rauhamaa wrote:
>>
>>> That said, UTF-8 does suffer badly from its not being
>>> a bijective mapping.
>>
>> Can you explain?
>
> In Python terms, there are bytes objects b that don't satisfy:
>
>    b.decode('utf-8').encode('utf-8') == b

Please provide an example; that sounds like a bug. If there is any
invalid UTF-8 stream which decodes without an error, it is actually a
security bug, and should be fixed pronto in all affected and supported
versions.

ChrisA



More information about the Python-list mailing list