Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Sat Mar 7 11:53:09 EST 2015


On Sun, Mar 8, 2015 at 3:40 AM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>> Here's an example:
>>
>>     b = b'\x80'
>>
>> Yes, it generates an exception. IOW, UTF-8 is not a bijective mapping
>> from str objects to bytes objects.
>>
>
> Python 2 might, Python 3 doesn't.

He was talking about this line of code:

b.decode('utf-8').encode('utf-8') == b

With the above assignment, that does indeed throw an error - which is
correct behaviour.

Challenge: Figure out a byte-string input that will make this function
return True.

def is_utf8_broken(b):
    return b.decode('utf-8').encode('utf-8') != b

Correct responses for this function are either False or raising an exception.

ChrisA



More information about the Python-list mailing list