Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Sun Mar 8 03:37:34 EDT 2015


On Sun, Mar 8, 2015 at 6:20 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>  * it still isn't bijective between str and bytes:
>
>    >>> '\udd00'.encode('utf-8', errors='surrogateescape')
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>    UnicodeEncodeError: 'utf-8' codec can't encode character
>    '\udd00' in position 0: surrogates not allowed

Once again, you appear to be surprised that invalid data is failing.
Why is this so strange? U+DD00 is not a valid character. It is quite
correct to throw this error.

ChrisA



More information about the Python-list mailing list