Text-mode apps (Was :Who are the "spacists"?)

Chris Angelico rosuav at gmail.com
Sun Mar 26 14:20:59 EDT 2017


On Mon, Mar 27, 2017 at 4:43 AM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> On Mon, 27 Mar 2017 02:37 am, Chris Angelico wrote:
>
>> Just use Unicode. Everything else, these days, is a subset of Unicode
>> anyway. Unless you're stuck on the default Windows shell/terminal, you
>> should be able to use UTF-8 everywhere and have the entire Unicode
>> range available. For example, the IBM OEM box-drawing characters are
>> available in Code Page 437... or as Unicode code points in the U+25xx
>> range.
>
> You are absolutely correct in theory, but in practice the availability of
> glyphs in most fonts is only a tiny proportion of the Unicode range. And
> even when the glyphs are available, the quality often varies: for example,
> in all of the monospaced fonts I've looked at, the superscripts ¹²³ are a
> different size to ⁴⁵⁶⁷⁸⁹⁰, both vertically and horizontally. And I've come
> across a few fonts where the box drawing characters don't all line up.
>
> Don't misunderstand me: Unicode is a HUGE improvement over the Bad Old Days
> of dozens of incompatible character sets. But let's not pretend that it
> makes it *easy*.

Unicode makes it about as easy as it can possibly be, given the
diversity of human languages and their various needs, plus the reality
that nobody has infinite resources for creating fonts. The most
important thing is that you have a single universal specification that
governs the interpretation of text, which means you can aim for that
standard, and if a renderer doesn't match up to it, that's a bug that
can be fixed, not behaviour that has to be maintained for backward
compatibility. So, for instance, Eryk Sun commented that my rounded
box example didn't render correctly in all fonts - but in the future,
a new version of those fonts could be released, adding support for
those characters. We *know* that the code points I used are
permanently and irrevocably allocated to the purposes I used them for,
so we can all be confident that they'll be used correctly if at all.

>>> And more important: can one use binary (bitmap) fonts in default modern
>>> linux console? If yes, can one patch them with custom tiles at
>>> the application start?
>>
>> If you really need something completely custom, it's not text any
>> more.
>
> That's not quite right. Unicode includes 137000 or so Private Use
> Characters, which anyone can define for their own purposes, "by private
> agreement". There's an unofficial registry of such private use characters
> here:
>
> http://www.evertype.com/standards/csur/
>
> More info here:
>
> https://en.wikipedia.org/wiki/Private_Use_Areas

This is true, but at some point, it's not text any more. The PUAs are
great if you're working with non-standard forms of text (eg Klingon,
or Elvish), but aren't well suited to arbitrary graphics. You should
be able to copy and paste text from one application into another and
have it retain its meaning; the PUAs weaken this guarantee in that the
applications have to agree to coexist, but within that, the same
ability to copy and paste should hold. It sounds like Mikhail is
trying to warp the console into something pseudo-graphical like a
Roguelike game, but instead of designing the game around available
glyphs, wants to design the glyphs to suit the game - and at that
point, it's time to just go graphical, IMO.

>> More likely, you don't truly need something custom - what you need is
>> a different subset of characters (maybe you need to mix Latin, Greek,
>> and Hebrew letters, in order to show interlinear translation of the
>> Bible). Instead of messing around with character sets, you can just
>> use Unicode and have all of them available.
>
> Assuming you have a text widget which is capable of displaying LTR and RTL
> text, and support for all the glyphs you need.

This is true. Fortunately, some measure of this is available in all
the major GUI toolkits, although some applications need to have their
own handling too. But here's what it takes to have full Unicode
support in my MUD client:

1) Use GTK and Pango. That's the biggest part.
2) Due to the nature of MUDs, RTL text is still left-justified.
3) Due to the interaction of left-justified RTL text and indentation,
special handling is needed for lines that begin with spaces and/or
tabs and then an RTL character. (Solution: Prepend U+200E
LEFT-TO-RIGHT MARK.)
4) Individual lines may be painted in multiple colours, so the text
gets divided up and measured in pieces. I let Pango do the actual
measurements.

That's about it. Basically I just let Pango do all the heavy lifting;
if it says this text in this font takes up this many pixels, that's
how many pixels of width I allocate it. RTL? LTR? Mixed? No problem.
It'll do all that for me.

So yeah. Unicode isn't a panacea, and if you come from an ASCII-only
environment, Unicode will look hard - but it's making life far easier
than the alternatives would be.

ChrisA



More information about the Python-list mailing list