Can't print national characters in IDLE with Python 2.2.1c1

Magnus Lyckå magnus at thinkware.se
Wed Mar 20 07:21:46 EST 2002


Thank's for clearing this up Martin (and Oleg who mailed).
And thanks for all the work you do to actually make these
things work. I've followed the discussions, and realize the
problems, but I didn't expect the Span^W silent conversion via
unicode that Tcl/Tk performs.

Maybe German or Japanese Python books are better at explaining
localization and problems with not using US ASCII, but all the
Python books in English I read just ignores this subject. Just
as they ignore site.py etc.

Martin v. Loewis wrote:

> It's a known limitation, also it is not clear what the solution should
> be.


Well, actually it was very simple. Just enable the locale in
site.py. (Thank's Oleg.)

if 1: # <--- Changed 'if 0:' to 'if 1:'. I just had to change one bit!
     # Enable to support locale aware default string encodings.
     import locale
     loc = locale.getdefaultlocale()
     if loc[1]:
         encoding = loc[1]

 

I assume that in the longer perspective, unicode is the only way to
go for proper i18n, at least if your programs will run on platforms
that you don't have control over... (Assuming all users will have
full character sets!)


> If you type "funny characters" in IDLE, Tk will represent them as
> Unicode strings (in fact, it represents *all* strings as Unicode
> strings). 


Aha! But is this new? I thought python 2.1 used the same Tcl/Tk version 
(8.3). What puzzled me was that this always worked before. (As long as
used IDLE at least--six years?) I knew that I'd get this behaviour if
I'd type u"åäö". But I just made a plain string. Or so I thought...

> Notice that, strictly speaking, your program is incorrect: 

...

> # ... 8-bit characters may be used in string literals and
> # comments but their interpretation is platform dependent;


Raising an exception is not what I expect when I read "interpretation
is platform dependent". I only assume it would mean that I can't assume
that ord(x) would have a particular value for x = 'ä'

> # ... the proper
> # way to insert 8-bit characters in string literals is by using octal
> # or hexadecimal escape sequences.

That doesn't help one tiny bit. I still don't know what print "\0xd5"

will look like if I don't know the locale settings. Besides, the error
in IDLE is the same due to this silent unicode-string translation.
So making "correct" programs in IDLE causes just the same problem.


> There is no easy solution to this. Just consider the fragment
> 
>  >>> s = 'åäö'
>  >>> print ord(s[0])
> 
> What do you want to be printed here (what number)? Assuming you have
> some answer (say, 229), then what would you expect if s contained some
> Cyrillic or Japanese characters?


One solution is to use the locale. I assumed that using setlocale
would do the trick, but obviously that must be set up in site.py and
cemented with os.setdefaultencoding. Indicating encoding in the file
as is being discussed seems reasonable to me.

 
> Under PEP 263, some of the current restrictions will be removed, so
> that you can put those characters into Unicode literals. Putting them
> into string literals still won't be supported.


But... It's always been supported!!! Until now. I hope you don't

imply that it will stop working, even if you set default encoding?

So, when will Python be all Unicode, and the 7-bit legacy put on

the same scrap pile as all 7-bit hardware? (It's really silly to
claim that software changes faster than hardware. :)

My son is both Lithuanian and Swedish. There is no 8-bit character
set for him! He needs transparent Unicode support!!! :) But he's only
three, so there is still some time before he starts programming. (At
least a year.... ;-)

P.S. It's not only IDLE that has shortcomings with non ASCII characters.
PythonWin behaves very strange as well. In some columns (!) it won't
display certain characters correct (the program will work, but what
you see on the screen will change if you change the indentation), and
when you press backspace after typing a non-ascii character, you will
probably get a yen symbol etc where you had your "bad" letter (I even
got a corrupted source code file due to this, and the bad byte wasn't
visible in PythonWin at all!). I guess vim and emacs are the only
reliable code editors... :-(




More information about the Python-list mailing list