IDLE raw_input() and unicode

Seo Sanghyeon unendliche at hanmail.net
Wed Jun 4 23:00:54 EDT 2003


My setting is IDLE 0.8 on Python 2.2.2, WinXP. In case it's relevant,
here's Korea. (Hangeul is Korean writing.)

IDLE can print Hangeul just fine. (Note: my sitecustomize.py does
sys.setdefaultencoding("utf-8"). In case it's relevant.)

But IDLE fail to get Hangeul input with raw_input(). It baffles
and prints out:

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in ?
    raw_input()
TypeError: object.readline() returned non-string

After some study, I found that it seems bulitin raw_input() does
sys.stdin.readline() *AND* type-checking. (Whoa... since C code
is involved, there's no traceback and I'm not sure. Should I read
C code on Python CVS? *normal user shudder*)

sys.stdin is replaced with an instance of PyShell.PyShell by IDLE.
And readline() method of PyShell class does something I don't understand
and does:

----
# PyShell.py around line 475

line = self.text.get("iomark", "end-1c")
...
return line
----

PyShell inherits from OutputWindow and it in turn inherits from
EditorWindow and EditorWindow initializes self.text as Tkinter.Text
widget. And I don't know what the hell "self.tk.call(self._w, 'get',
index1, index2)", that is, an implementation of Tkinter.Text.get(),
does at all, but I assume it returns sort of "non-string".

So I opened DOS command line window and started python. And typed:

----
import PyShell
shell = PyShell.PyShell()
shell.begin()
shell.reading = 1
# Typing some Hangeul. In this case, the name of Python itself.
shell.text.get('iomark', 'end-1c')
----

It prints out: u'\ud30c\uc774\uc36c\n'. So some sort of "non-string"
is actually unicode.

So... I suggest the following:

1) Make raw_input() able to return unicode. Why not? (But I suspect
there may be some deep reason.)
2) Or, at least, make PyShell.readline() returns other than "non-string".
I think just changing "return line" to "return str(line)" would do.
(Make a change and try again.) Yes, it does.

I googled with "IDLE raw_input unicode", and to my surprise, just few
posts I found. Some German one posted raw_input() doesn't handle umlauts.
So it seems this is quite general i18n problem.

Should I submit... *normal user shudder* a patch? It's just one-line
change... Or can someone do all unicode-IDLE-users a favor and submit
a patch?

I'm more than happy to hear something like "Yes, we know that, and it's
all fixed on IDLE version >0.8, and with Python 2.3 you will have no
problem."

-- Seo Sanghyeon




More information about the Python-list mailing list