Text-mode apps (Was :Who are the "spacists"?)

eryk sun eryksun at gmail.com
Sun Mar 26 14:37:11 EDT 2017


On Sun, Mar 26, 2017 at 5:58 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> The Windows console can render any character in the BMP, but it
>> requires configuring font linking for fallback fonts. It's Windows, so
>> of course the supported UTF format is UTF-16. The console's UTF-8
>> support (codepage 65001) is too buggy to even consider using it.
>
> Is it actually UTF-16, or is it UCS-2?

Pedantically speaking it's UCS-2. Console buffers aren't necessarily
valid UTF-16, i.e. they can have lone surrogate codes or invalid
surrogate pairs. The way a surrogate code gets rendered depends on the
font. It could be an empty box, a box containing a question mark, or
simply empty space. That applies even if it's a valid UTF-16 surrogate
pair, so the console can't display non-BMP characters such as emojis.
They can be copied to the clipboard and displayed in another program.

Windows file systems are also UCS-2. For the most part it's not an
issue since the source of text and filenames will be valid UTF-16.



More information about the Python-list mailing list