[Python-bugs-list] [Bug #119960] Encoding bugs.

noreply@sourceforge.net noreply@sourceforge.net
Wed, 1 Nov 2000 13:16:22 -0800


Bug #119960, was updated on 2000-Oct-31 13:38
Here is a current snapshot of the bug.

Project: Python
Category: Tkinter
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Summary: Encoding bugs.

Details: Win98, Python2.0final.

1. I can't write cyrillic letters in IDLE editor.

I tried to figure, what's happened and found that
Tcl has command 'encoding'. I typed in IDLE shell:

>>> from Tkinter import *
>>> root = Tk()
>>> root.tk.call("encoding", "names")
'utf-8 identity unicode'
>>> root.tk.call("encoding", "system")
'identity'

But Tcl had numerous encodings in 'tcl\tcl8.3\encodings'
including 'cp1251'!

Then I installed Tk separately and removed tcl83.dll
and tk83.dll from DLLs:

>>> from Tkinter import *
>>> root = Tk()
>>> root.tk.call("encoding", "names")
'cp860 cp861 [.........] cp857 unicode'
>>> root.tk.call("encoding", "system")
'cp1251'

So, when tcl/tk dlls in Python\DLLs directory,
TCL can't load all it's encodings.

But this is not the end.

I typed in IDLE shell:

>>> print "hello <in russian>" # all chars looks correctly.
and got:
Exception in Tkinter callback
Traceback (most recent call last):
  File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__
    return apply(self.func, args)
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 579, in enter_callback
    self.runit()
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 598, in runit
    more = self.interp.runsource(line)
  File "C:\PYTHON20\Tools\idle\PyShell.py", line 183, in runsource
    return InteractiveInterpreter.runsource(self, source, filename)
  File "c:\python20\lib\code.py", line 61, in runsource
    code = compile_command(source, filename, symbol)
  File "c:\python20\lib\codeop.py", line 61, in compile_command
    code = compile(source, filename, symbol)
UnicodeError: ASCII encoding error: ordinal not in range(128)
print "[the same characters]"
Then, when I pressed Enter again, i got the same
error message. I stopped this by pressing C-Break.

[1/2 hour later]
I fix this by editing site.py:
if 1: # was: if 0
  # Enable to support locale aware default string encodings.

I typed again:
>>> print "hello <in russian>"
and got:
<some strange letters>
>>> print unicode("hello <in russian>")
<some strange letters>

[2 hours later]
Looking sources of _tkinter.c:

static Tcl_Obj* AsObj(PyObject *value)
{
    if type(value) is StringType:
        return Tcl_NewStringObj(value)
    elif type(value) is UnicodeType:
        ...
...
}

But I read in
<http://dev.scriptics.com/doc/howto/i18n.html>
that all Tcl functions require all strings to
be passed in UTF-8. So, this code must look like:

    if type(value) is StringType:
        if TCL_Version >= 8.1:
             return Tcl_NewStringObj(<value converted
           to UTF-8 string using sys.getdefaultencoding()>)
        else:
             return Tcl_NewStringObj(value)

And when I typed:
>>> print unicode("hello <in russian>").encode('utf-8')
i got:
hello <in russian>

This is the end.

P.S. Sorry for my bad english, but I really want to
use IDLE and Tkinter in our school, so I can't wait
for somebody other writing bug report.

Follow-Ups:

Date: 2000-Nov-01 08:00
By: jhylton

Comment:
I am not entirely sure what the bug is, though I agree that it can be confusing to deal with Unicode strings.

-------------------------------------------------------

Date: 2000-Nov-01 12:47
By: lemburg

Comment:
AFAIK, the _tkinter.c code automatically converts Unicode
to UTF-8 and then passes this to Tcl/Tk.

So basically the folloing should get you correct results...

print unicode("hello <in russian>", "cp1251")

Alternatively, you can set your default encoding to "cp1251"
in the way your describe and then write:

print unicode("hello <in russian>")

I am not too familiar with Tcl/Tk, so I can't judge whether trying
to recode normal 8-bit into UTF-8 is a good idea in general
for the _tkinter.c interface. It would easily be possible using:

utf8 = string.encode('utf-8')

since 8-bit support the .encode() method too.
-------------------------------------------------------

Date: 2000-Nov-01 13:16
By: kirill_simonov

Comment:
1. print unicode("<cyrillic>") in IDLE don't work!
The mechanics (I think) is
a) print unicode_string encodes unicode string to
normal string using default encoding and pass it
to sys.stdout.
b) sys.stdout intercepted by IDLE. IDLE sent this string
to Tkinter.
c) Tkinter pass this string (not unicode but cp1251!)
to TCL but TCL waits for UTF-8 string!!!
d) I see messy characters on screen.
2. You breaks compability! In 1.5 I can write
Button(root, text="<cyrillic>") and this works.
Writing unicode("<>", 'cp1251') is UGLY and ANNOYING!
TCL requires string in utf-8. All pythonian strings
is sys.getdefaultencoding() encoding. So, we have to
recode all strings to utf-8.
3. TCL in DLLs can't found it's encodings in
tcl\tk8.3\encodings! I don't no why. So, I can't write
in Tkinter.Text in russian.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=119960&group_id=5470