[Tutor] Changing the interpreter prompt symbol from ">>>" to ???

eryk sun eryksun at gmail.com
Sun Mar 13 03:39:11 EDT 2016


On Sat, Mar 12, 2016 at 12:46 AM, boB Stepp <robertvstepp at gmail.com> wrote:
> I did with the non-printing control character, but not with '\u25ba' !
>  So I had to go through some contortions after some research to get my
> Win7 cmd.exe and PowerShell to display the desired prompt using

The console is hosted by another process named conhost.exe. When
Python is running in the foreground, the cmd shell is just waiting in
the background until Python exits.

> '\u25ba' as the character with utf-8 encoding.  My new
> pythonstartup.py file (Which PYTHONSTARTUP now points to) follows:
>
> #!/usr/bin/env python3
>
> import os
> import sys
>
> os.system('chcp 65001')    # cmd.exe and PowerShell require the code
> page to be changed.
> sys.ps1 = '\u25ba '  # I remembered to add the additional space.

chcp.com calls SetConsoleCP (to change the input codepage) and
SetConsoleOutputCP. You can call these functions via ctypes if you
need to separately modify the input or output codepages. For example:

    >>> kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
    >>> sys.ps1 = '\u25ba '
    Γû║ kernel32.SetConsoleOutputCP(65001)
    1
    ►

UTF-8 in the console is generally buggy, but this example works since
the sys.ps1 prompt is written without buffering (discussed below).
However, Python still thinks the console is using the initial
codepage. To print non-ASCII characters, you'll either have to change
the codepage before starting Python or rebind sys.stdout and
sys.stderr.

    ► print(''.join(chr(x) for x in range(240, 256)))
    ����������������

Let's try to fix this:

    ► fd = os.dup(1)
    ► sys.stdout = open(fd, 'w', encoding='utf-8')
    ► print(''.join(chr(x) for x in range(240, 256)))
    ðñòóôõö÷øùúûüýþÿ
    ùúûüýþÿ
    �þÿ

    ►

The above buggy output is in Windows 7. Codepage 65001 was only ever
meant for encoding text to and from files and sockets, via WinAPI
WideCharToMultiByte and MultiByteToWideChar. Using it in the console
is buggy because the console's ANSI API hard codes a lot of
assumptions that fall apart with UTF-8.

For example, if you try to paste non-ASCII characters into the console
using 65001 as the input codepage, Python will quit as if you had
entered Ctrl+Z. Also, in Windows 7 when you print non-ASCII characters
you'll get a trail of garbage written to the end of the print in
proportion to the number of non-ASCII characters, especially with
character codes that take more than 2 UTF-8 bytes. Also, with Python
2, the CRT's FILE buffering can split a UTF-8 sequence across two
writes. The split UTF-8 sequence gets printed as 2 to 4 replacement
characters. I've discussed these problems in more detail in the
following issue:

    http://bugs.python.org/issue26345

Windows 10 fixes the problem with printing extra characters, but it
still has the problem with reading non-ASCII input as UTF-8 and still
can't handle a buffered writer that splits a UTF-8 sequence across two
writes. Maybe Windows 20 will finally get this right.

For the time being, programs that use Unicode in the console should
use the wide-character (UTF-16) API. Python doesn't support this out
of the box, since it's not designed to handle UTF-16 in the raw I/O
layer. The win-unicode-console package add this support.

    https://pypi.python.org/pypi/win_unicode_console


More information about the Tutor mailing list