[issue13153] IDLE crashes when pasting non-BMP unicode char on Py3

Serhiy Storchaka report at bugs.python.org
Fri Nov 6 02:15:52 EST 2015


Serhiy Storchaka added the comment:

There is no the Snake emoji in my font, I use the Cat Face emoji U+1F431 🐱 (\xf0\x9f\x90\xb1 in UTF-8, \x3d\xd8\x31\xdc in UTF-16LE).

Move cursor or press Backspace. I had needed to press Left 2 times to move cursor to the begin of the line, press Right 4 times to move cursor back to the end of line, and press Backspace 4 times to remove all stuff. What is called "Tk doesn't support astral characters".

Get the text programmically.

>>> text.get('1.0', '1.end')
'ð゚ミᄆ'
>>> print(ascii(text.get('1.0', '1.end')))
'\xf0\uff9f\uff90\uffb1'

On Linux the clipboard uses UTF-8, and this symbol is represented by 4-bytes bytestring b'\xf0\x9f\x90\xb1' (that is why Tk sometimes interpret it as 4 characters). When you request the text content as a Unicode, Tcl fails to decode the string from UTF-8 and falls back to Latin1. Due to other bug it extends the sign of some bytes. When you programmically insert the same string back, it will be encoded to b'\xc3\xb0\xef\xbe\x9f\xef\xbe\x90\xef\xbe\xb1' and displayed as 'ð゚ミᄆ'.

On Windows the clipboard uses UTF-16LE and you can see different results.

The underlying graphical system can support astral characters, but Tk fails to handle them correctly.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13153>
_______________________________________


More information about the Python-bugs-list mailing list