[issue17348] Unicode - encoding seems to be lost for inputs of unicode chars in IDLE

Mon Apr 22 01:19:42 CEST 2013

Tomoki Imai added the comment:

Thanks.

I noticed Terry used python3 to confirm this problem...

I am Japanese, but using English environment.
Here is my locale settings. And I'm using Linux.
konomi:tomoki% locale                                    
LANG=en_US.utf8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

All strings used internally should be unicode type.
In Japan, many many charset is here.(cp932,euc-jp,...).
And, they causes problems in Python2 without converting it to unicode type.
Remember, unicode type and "utf-8" is not same.

When I type into Tkinter's Entry and get Entry's value,it returned me unicode.
And deleted code converts unicode to str type.
They are unified in Python3.(unicode become str,and str become byte).
So, these lines are not in Python3 codes.

I typed these strings using "Input Method"(am using uim).
https://code.google.com/p/uim/
But, I don't know how uim generate these characters.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17348>
_______________________________________