Can't print national characters in IDLE with Python 2.2.1c1

Martin v. Löwis loewis at informatik.hu-berlin.de
Wed Mar 20 11:48:20 EST 2002


Magnus Lyckå <magnus at thinkware.se> writes:

> Martin v. Loewis wrote:
> 
> > It's a known limitation, also it is not clear what the solution should
> > be.
> 
> 
> Well, actually it was very simple. Just enable the locale in
> site.py. (Thank's Oleg.)

That's a work-around, not a solution. For most cases, I strongly
discourage changing the site default encoding. In turn, program that
may work in some way on your installation will work differently on
different installations.

> > If you type "funny characters" in IDLE, Tk will represent them as
> > Unicode strings (in fact, it represents *all* strings as Unicode
> > strings).
> 
> 
> Aha! But is this new? 

Tcl uses Unicode internally starting with 8.0. Since 8.1, it uses
proper Unicode objects (in 8.0, it uses UTF-8 only).

> What puzzled me was that this always worked before. (As long as used
> IDLE at least--six years?)

Yes, in the past, both Tcl and Tkinter would use the locale's
character set automatically. This has changed.

> > # ... 8-bit characters may be used in string literals and
> > # comments but their interpretation is platform dependent;
> 
> 
> Raising an exception is not what I expect when I read "interpretation
> is platform dependent". I only assume it would mean that I can't assume
> that ord(x) would have a particular value for x = 'ä'

Yes, that is how you could interpret it. You could also interpret it
as allowing an implementation to refuse certain input.

> That doesn't help one tiny bit. I still don't know what print "\0xd5"
> 
> will look like if I don't know the locale settings. Besides, the error
> in IDLE is the same due to this silent unicode-string translation.

There is no "silent Unicode translation". The text you type *is*
Unicode; no conversion needed.

> > Under PEP 263, some of the current restrictions will be removed, so
> > that you can put those characters into Unicode literals. Putting them
> > into string literals still won't be supported.
> 
> 
> But... It's always been supported!!! 

Not really. In a European Windows installation, with Python 1.5, try
inserting Cyrillic characters into your window.

> Until now. I hope you don't imply that it will stop working, even if
> you set default encoding?

Using non-ASCII characters in source code without a declared encoding
will be deprecated, and will eventually be an error, yes. What to do
with IDLE (or any other form of interactive prompt) is still an open
issue.

> So, when will Python be all Unicode, and the 7-bit legacy put on
> the same scrap pile as all 7-bit hardware? 

The tricky question here is: What means "all Unicode" to you? If you
mean that all string literals are Unicode literals - you would not
like that. Just try invoking 'python -U'.

In a file, you will always have byte sequences, not 'Unicode'. This
won't change in our lifetimes (and it won't change in the life time of
your son, either). So you will always be faced with the problem of
converting byte sequences to characters, which will might always give
strange results when done automatically.

Regards,
Martin



More information about the Python-list mailing list