[Python-Dev] IDLE and non-ASCII characters

Tim Peters tim.one@home.com
Tue, 15 May 2001 02:28:34 -0400


[Guido]
> Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

[Martin]
> Perhaps the Tcl version? That sounds like the issue that Marc talked
> about: Tk behaves differently when text is entered programmatically
> (and perhaps through cut-n-paste), as compared to text entered through
> the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
> Solaris 8 still gives me the UnicodeError.

I don't know which version of Python Guido used.  I tried cut-&-paste of

    s='äö'

from his email into the distributed 2.1 IDLE under Win98, and got

    UnicodeError: ASCII encoding error: ordinal not in range(128)

Tk appears to interfere with using the usual Windows ALT+0nnn method of
entering funny characters, so unsure what happens then -- but for me it
either works fine or does something insane (moves the cursor to the left
margin, brings up an IDLE dialog box, etc).

If I open the system Character Map utility and copy-&-paste using *that*, I
can enter all sorts of stuff without problem:

>>> s = "àáâãäåæçèéêëìíîïðñòòóôõö÷øùúûüýþÿ"
>>> s
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
\xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>>

So not all clipboard entries are created equal.

Another clue:  if I paste the s='äö' snippet from Guido's email into a file
opened with Notepad, then immediately copy it again from the Notepad doc,
then paste that into Idle, again no problem:

>>> s='äö'
>>> s
'\xe4\xf6'
>>>

Using a clipboard diagnostic tool I don't understand, when I copy from
Notepad these data formats are in the system clipboard:

    TEXT
    LOCALE
    OEMTEXT

But when I copy from Guido's email under Outlook 2000, it's

    DataObject
    Rich Text Format
    Rich Text Format Without Objects
    RTF as Text
    TEXT
    UNICODTEXT
    Ole Private Data
    LOCALE
    OEMTEXT

Under Character Map, it's

    Rich Text Format
    TEXT
    LOCALE
    OEMTEXT

So perhaps it's not the version of Tk but the source of the data, and that Tk
grabs an unfortunate data format (when present) from the clipboard in
preference to a fortunate one.

the-clipboard-is-a-complex-beast-ly y'rs  - tim