IDLE "Codepage" Switching?

Thomas Passin list1 at tompassin.net
Tue Jan 17 22:58:53 EST 2023


On 1/17/2023 8:46 PM, rbowman wrote:
> On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
> 
>> 2. Does the IDLE in Python 3.x behave the same way?
> 
> fwiw
> 
> Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
> Type "help", "copyright", "credits" or "license()" for more information.
> str = ""
> for c in range(157, 169):
>      str += chr(c) + ""
> 
>      
> print(str)
> žŸ ¡¢£¤¥¦§¨
> str = ""
> for c in range(140, 169):
>      str += chr(c) + " "
> 
>      
> print(str)
> Œ  Ž   ‘ ’ “ ” • – — ˜ ™ š › œ  ž Ÿ   ¡ ¢ £ ¤ ¥
> ¦ § ¨
> 
> 
> I don't know how this will appear since Pan is showing the icon for a
> character not in its set.  However, even with more undefined characters
> the printable one do not change. I get the same output running Python3
> from the terminal so it's not an IDLE thing.

I'm not sure what explanation is being asked for here.  Let's take 
Python3, so we can be sure that the strings are in unicode.  The font 
being used by the console isn't mentioned, but there's no reason it 
should have glyphs for any random unicode character.  In my case, I see 
the same missing and printable characters as in the previous post 
(above).  The font is Source Code Pro Medium.

Changing the console's code page won't magically provide the missing glyphs.

I wrote these characters to a file using utf-8 encoding and opened it in 
an editor that recognized the content as utf-8 (EditPlus).  It displayed 
the same characters but had fewer leading spaces (i.e., missing glyphs), 
and did not show any default "missing-character" glyphs.  The editor is 
using the Cousine font.

The second factor that could be in play is what the default character 
encoding is, which is set by Windows and could be different in different 
places (locales).  I don't recall just now how Python3 handles this. 
Since Python2 strings are not unicode unless specified, and Python2 
probably handles the locale/default encoding differently from Python3, 
it would not be a surprise if the two give different results.

If you print such a Python2 string, you will get glyphs for (non-ascii) 
ord(chr) > 127 that come from the Windows code page table, which will be 
different from what Python3 will display.

Python3 uses Windows Unicode API functions, and isn't subject to the 
same limitations as Python2 was - Python2 had to go though the Windows 
code page apparatus and didn't use the Unicode API.  See PEP 528 - 
https://peps.python.org/pep-0528/)

IDLE sets up its own window itself, and probably uses a different font 
from the default Windows console, so there could be some differences 
there too, especially as to whether missing glyphs show a visible symbol 
or not.

Code Page 65001 was often claimed to be for utf-8.  It's not really 
correct in general, but it's OK for many utf-8 characters.  But in 
Python2, the codecs module does not know about code page 65001 - unless 
you apply a simple patch - so if you try to set the console to cp65001, 
you cannot get anything printed.  You get an exception raised instead.

Yes, it's all confusing, and especially with Python2.




More information about the Python-list mailing list