UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

Sun Sep 29 13:30:48 EDT 2013

On 9/29/2013 6:53 AM, Ned Batchelder wrote:

> This is the nature of Unicode pain in Python 2 (Python 3 has a different
> kind!).  This may help you understand what's going on:
> http://nedbatchelder.com/text/unipain.html

This is really excellent and I bookmarked it.

There is one minor error: "the conversion from int to float can't fail,"

 >>> float(10**1000)
Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
     float(10**1000)
OverflowError: long int too large to convert to float

Even when it succeeds, it can fail in the sense of losing information.
 >>> int(float(12345678901234567890))
12345678901234567168
 >>> float(int(1.55))
1.0

This is somewhat analogous to a combination of errors='ignore' and 
errors='replace' (with random garbage).

I think the presentation would be strengthened with the correction, as 
it shows that the problems of conversion are *not* unique to bytes and 
unicode.

-- 
Terry Jan Reedy