Newbie question about text encoding

Mark Lawrence breamoreboy at yahoo.co.uk
Sat Mar 7 11:40:08 EST 2015


On 07/03/2015 16:25, Marko Rauhamaa wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> On Sun, Mar 8, 2015 at 2:48 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
>>> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
>>>
>>>> Marko Rauhamaa wrote:
>>>>
>>>>> That said, UTF-8 does suffer badly from its not being
>>>>> a bijective mapping.
>>>>
>>>> Can you explain?
>>>
>>> In Python terms, there are bytes objects b that don't satisfy:
>>>
>>>     b.decode('utf-8').encode('utf-8') == b
>>
>> Please provide an example; that sounds like a bug. If there is any
>> invalid UTF-8 stream which decodes without an error, it is actually a
>> security bug, and should be fixed pronto in all affected and supported
>> versions.
>
> Here's an example:
>
>     b = b'\x80'
>
> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
> from str objects to bytes objects.
>

Python 2 might, Python 3 doesn't.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence




More information about the Python-list mailing list