IDLE "Codepage" Switching?

Eryk Sun eryksun at gmail.com
Wed Jan 18 15:41:01 EST 2023


On 1/17/23, Stephen Tucker <stephen_tucker at sil.org> wrote:
>
> 1. Can anybody explain the behaviour in IDLE (Python version 2.7.10)
> reported below? (It seems that the way it renders a given sequence of bytes
> depends on the sequence.)

In 2.x, IDLE tries to decode a byte string via unicode() before
writing to the Tk text widget. However, if the locale encoding (e.g.
the process ANSI code page) fails to decode one or more characters,
IDLE lets Tk figure out how to decode the byte string.

Python 2.7 has an older version of Tk that has peculiar behavior on
Windows when bytes in the range 0x80-0xBF are written to a text box.
Bytes in this range get translated to native wide characters (16-bit
characters) in the halfwidth/fullwidth Unicode block, i.e. translated
to Unicode U+FF80 - U+FFBF.

If IDLE decodes using code page 1252, then the ordinals 0x81, 0x8d,
0x8f, 0x90 and 0x9d can't be decoded. IDLE thus passes the undecoded
byte string to Tk. The example you provided that demonstrates the
behavior contains ordinal 0x9d (157).

I get similar behavior for the other undefined ordinal values in code
page 1252. For example, using IDLE 2.7.18 on Windows:

    >>> print '\x81\xa1'
    チᄀ
    >>> print 'a\xa1'
    a¡

In the first case, ordinal 0x81 causes decoding to fail in IDLE, so
the byte string is passed as is to Tk, which maps it to
'\uff81\uffa1'. In the second case, OTOH, "\xa1" is decoded by IDLE as
"¡".

> 2. Does the IDLE in Python 3.x behave the same way?

No, in 3.x only Unicode str() objects are written to the Tk text
widget. Moreover, the text widget doesn't have the same behavior in
newer versions. It ignores bytes in the control-block range 0x80-0x9F,
and it decodes bytes in the range 0xA0-0xBF normally.


More information about the Python-list mailing list