UnicodeEncodeError in compile

Terry Reedy tjreedy at udel.edu
Tue Jan 10 19:56:39 EST 2012


On 1/10/2012 8:43 AM, jmfauth wrote:
> D:\>c:\python32\python.exe
> Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit
> (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> '\u5de5'.encode('utf-8')
> b'\xe5\xb7\xa5'
>>>> '\u5de5'.encode('mbcs')
> Traceback (most recent call last):
>    File "<stdin>", line 1, in<module>
> UnicodeEncodeError: 'mbcs' codec can't encode characters in position
> 0--1: inval
> id character

> D:\>c:\python27\python.exe
> Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> u'\u5de5'.encode('utf-8')
> '\xe5\xb7\xa5'
>>>> u'\u5de5'.encode('mbcs')
> '?'

mbcs encodes according to the current codepage. Only the chinese 
codepage(s) can encode the chinese char. So the unicode error is correct 
and 2.7 has a bug in that it is doing "errors='replace'" when it 
supposedly is doing "errors='strict'". The Py3 fix was done in
http://bugs.python.org/issue850997
2.7 was intentionally left alone because of back-compatibility 
considerations. (None of this addresses the OP's question.)

-- 
Terry Jan Reedy




More information about the Python-list mailing list