[Python-Dev] GSoC: Replace MS Windows Console with Unicode UI

Glenn Linderman v+python at g.nevcal.com
Wed Mar 25 01:02:30 CET 2009


On approximately 3/24/2009 10:16 AM, came the following characters from 
the keyboard of INADA Naoki:
> Hi. I'm Japanese and non-ascii charactor user. (cp932)
> 
> We have to use "IME" to input non-ascii charactor in Windows.
> When "> chcp 65001" in cmd.exe, we cannot use IME on cmd.exe.
> 
> So setting codepage to 65001 make output universal but make input ascii-only.
> Sit!!!
> 
> I hope PyQtShell <http://code.google.com/p/pyqtshell/> become good
> IDLE alternative.


Thanks for the feedback.

So at least one version of the code I posted shows that 
programmatically, the code page can be set differently for input and 
output, although the last version brought both to 65001.  It seems that 
the chcp 65001 always does both.  If the IME only works for cp932, then 
leave input at cp932, and set output to 65001?

I have no idea if that could be a solution for you, but I would be 
interested in your results if you find that it is, or isn't, as that 
would add to the collective knowledge base about the subject.  This is 
idea 2, below, where I tried to cover the solution space more broadly.

Looking briefly at the definition of cp932, it seems that it covers most 
of the Unicode characters... so perhaps any or several of the following 
could happen:

1) the IME could be converted to produce UTF-8 instead of cp932, 
allowing use of 65001 for input and output
2) the split code page could be used to avoid the conversion of Unicode 
to cp932 for output.
3) Unicode could be converted to cp932 for output, allowing use of cp932 
for both input and output.

These are listed in the order of increased overhead for character handling.

Perhaps you could enlighten us all as to the issues with each of these 
ideas.

I realize the IME exists today, and is likely coded to use cp932, and 
that it would take some work to convert it to produce Unicode.  However, 
there seems to be a straightforward conversion chart between cp932 and 
Unicode at Wikipedia, so perhaps that isn't a huge effort.

It seems that the long term goal of having all software speak Unicode 
would increase the efficiency of all software when dealing with 
multi-lingual issues, as a common solution can be applied universally, 
rather than re-inventing solutions that only work for particular code pages.

But I'm not fully aware of whether or not the design or implementation 
of Unicode precludes universal solutions: I have heard rumors that 
certain characters must be interpreted differently in different locale 
contexts, which seems to be counter to the "one solution fits all" 
possibility.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking


More information about the Python-Dev mailing list