[issue5127] UnicodeEncodeError - I can't even see license

Ezio Melotti report at bugs.python.org
Mon Feb 2 02:56:56 CET 2009


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Here (winxpsp2, Py3, cp850-terminal) the license works fine:
>>> license
Type license() to see the full license text

and license() works as well.

I get this output for the chr()s:
>>> chr(0x10000)
'\U00010000'
>>> chr(0x11000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "C:\Programs\Python30\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
1-2: character maps to <undefined>

I believe that chr(0x10000) and chr(0x11000) should have the opposite
behavior.
U+10000 (LINEAR B SYLLABLE B008 A) belongs to the 'Lo' category and
should be printed (and possibly raise a UnicodeError, see issue5110
[1]), U+11000 belongs to the 'Cn' category and should be escaped[2].

On Linux with Py3 and a UTF-8 terminal, chr(0x10000) prints '\U00010000'
and chr(0x11000) prints the char (actually I see two boxes, but it
shouldn't be a problem of Python). The license() works fine too.

Also note that with cp850 the error message is 'character maps to
<undefined>' and with cp949 is 'illegal multibyte sequence'.

[1]: http://bugs.python.org/issue5110
[2]: http://www.python.org/dev/peps/pep-3138/#specification

----------
nosy: +ezio.melotti

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5127>
_______________________________________


More information about the Python-bugs-list mailing list