UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

Sun Sep 29 15:08:28 EDT 2013

On 9/29/13 1:30 PM, Terry Reedy wrote:
> On 9/29/2013 6:53 AM, Ned Batchelder wrote:
>
>> This is the nature of Unicode pain in Python 2 (Python 3 has a different
>> kind!).  This may help you understand what's going on:
>> http://nedbatchelder.com/text/unipain.html
>
> This is really excellent and I bookmarked it.
>
> There is one minor error: "the conversion from int to float can't fail,"
>
> >>> float(10**1000)
> Traceback (most recent call last):
>   File "<pyshell#0>", line 1, in <module>
>     float(10**1000)
> OverflowError: long int too large to convert to float
>
> Even when it succeeds, it can fail in the sense of losing information.
> >>> int(float(12345678901234567890))
> 12345678901234567168
> >>> float(int(1.55))
> 1.0
>
> This is somewhat analogous to a combination of errors='ignore' and 
> errors='replace' (with random garbage).
>
> I think the presentation would be strengthened with the correction, as 
> it shows that the problems of conversion are *not* unique to bytes and 
> unicode.
>

Thanks, these are excellent points.

--Ned.