IDLE "Codepage" Switching?

Peter J. Holzer hjp-python at hjp.at
Wed Jan 18 11:57:46 EST 2023


On 2023-01-18 11:05:24 -0500, Thomas Passin wrote:
> On 1/18/2023 5:43 AM, Stephen Tucker wrote:
> > Thanks for these responses.
> > 
> > I was encouraged to read that I'm not the only one to find this all
> > confusing.
> > 
> > I have investigated a little further.
> > 
> > 1. I produced the following IDLE log:
> > 
> > > > > mylongstr = ""
> > > > > for thisCP in range (1, 256):
> > mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "
> > 
> > 
> > > > > print mylongstr
> > 1, 2, 3, 4, 5, 6, 7, 8, 9,
> >   10, 11, 12,
> >   13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
> > 31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
> > , 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
> > 56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
> > E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
> > 81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
> > ^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
> > 105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
> > t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
> > 126, 127, タ 128, チ 129, ツ 130, テ 131, ト 132, ナ 133, ニ 134, ヌ 135, ネ 136, ノ
> > 137, ハ 138, ヒ 139, フ 140, ヘ 141, ホ 142, マ 143, ミ 144, ム 145, メ 146, モ 147,
> > ヤ 148, ユ 149, ヨ 150, ラ 151, リ 152, ル 153, レ 154, ロ 155, ワ 156, ン 157, ゙
> > 158, ゚ 159, ᅠ 160, ᄀ 161, ᄁ 162, ᆪ 163, ᄂ 164, ᆬ 165, ᆭ 166, ᄃ 167, ᄄ 168,
> > ᄅ 169, ᆰ 170, ᆱ 171, ᆲ 172, ᆳ 173, ᆴ 174, ᆵ 175, ᄚ 176, ᄆ 177, ᄇ 178, ᄈ
> > 179, ᄡ 180, ᄉ 181, ᄊ 182, ᄋ 183, ᄌ 184, ᄍ 185, ᄎ 186, ᄏ 187, ᄐ 188, ᄑ 189,
> > ᄒ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
> > 200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
> > Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
> > 221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
> > è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
> > 242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
> > ý 253, þ 254, ÿ 255,
> > > > > 
> > 
> > 2. I copied and pasted the IDLE log into a text file and ran a program on
> > it that told me about every byte in the log.
> > 
> > 3. I discovered the following:
> > 
> > Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;

Which might mean that they are also UTF-8-encoded (there is no
difference between UTF-8-encoding and ASCII-encoding for these
characters).


> > Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
> > characters whose codepoints were FF00 hex more than the byte values (hence
> > the strange glyphs);
> > 
> > Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
> > characters - without any offset being added to their codepoints in the
> > meantime!
> > 
> > I thought you might just be interested in this - there does seem to be some
> > method in IDLE's mind, at least.
> 
> This has nothing to do with IDLE.  The UTF-8 encoding of those code points
> uses two bytes instead of one.  See

That's not the peculiar thing. The peculiar thing is that characters
U+0080 to U+00BF are recoded to U+FF80 to U+FFBF (but U+00C0 to U+00FF
are printed normally).

I have no idea what's happening here. I can only urge Stephen to use
Python 3.x instead of Python 2.7. Python2 has been deprecated for years
has has reached its official end of life 3 years ago. There really
shouldn't be any reason to use Python 2.7 any more except
reverse-engineering old applications in order to port them to Python 3.

In particular, the type "str" is very different in Python2 and Python3.
In Python2 it is a sequence of bytes (similar to the Python3 type
"bytes") and in Python3 it is a sequence of (Unicode) characters
(similar to the Python2 type "unicode").

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20230118/3d0fc10e/attachment.sig>


More information about the Python-list mailing list