OT [Way OT]: Unicode Unification Objections

Kevin Russell krussell4 at videon.home.com
Sun May 7 15:48:07 EDT 2000


"Dennis E. Hamilton" wrote:

> Consider the following.  In Japanese texts, when a borrowed or employed
> Korean word is used, a desired practice is to render the Korean
> characters as different, even though some or all of them involve "the
> same character" common to both languages.  However, the iconography (or
> calligraphy) is commonly different.  This loses the ability to
> distinguish the linguistic use of the character, forcing material to be
> font-distinguished some how (e.g., give me the ones that look Korean,
> not the ones that look Japanese).  This means that the distinction can't
> be preserved in simple text.

The entire point of markup is that distinctions like this *shouldn't* be
preserved in simple text.

How a particular character should *look* is a question for presentation
glyphs, not for character encoding.  What kind of language a character
is used to represent (emphasis, quotations, foreign borrowings, titles,
proper
names, sarcasm, etc.) is a question for markup, not for character encoding.

A similar situation holds in almost every written language.  We often dump
Latin or French words into English text.  Even though they may be written
in the same alphabet, we usually want them to *look* different from
ordinary
English words.  But nobody uses that fact to argue that Unicode should have

handed out a separate code number to italic "e" than to plain "e" -- or
even
that entirely disjoint sets of codes should be used for writing Latin,
French,
and English words.  (Actually, there probably were people who argued these,

but they were rightly ignored.)

Once you start mixing up glyph questions (what a character should look
like) and markup questions (what kind of language is the character being
used for) with character encoding questions, there's no logical place to
stop.  Soon the Unicode Consortium would be fielding requests like
"I want to be able to distinguish the Yamato pronunciation of the
'sun' character whispered with a Hokkaido accent and a lisp from
the Yamato pronunciation of the 'sun' character as muttered inaudibly
by someone from Okinawa who has a head-cold.  And I want to
do it with a character that looks like it was finger-painted in drippy
chartreuse water-colours by a five-year-old.  And it should be 23-point
bold, 'cause it will be in a title.  I suggest you assign me character
xDE4A921B."


>
> Unfortunately, the Greek alphabet and the APL alphabet (and apparently
> some other math symbol alphabets) *were* unified.  That is, a number of
> Greek-letter symbols were removed from any distinct APL character set,
> and only some APL-unique made-up symbols having Greek letters in them
> were retained as separate.  Unfortunately, the iconography of the Greek
> alphabet in Greek text is often enough different that those codes don't
> render appropriately with the other APL symbols when used in APL texts.
> Borrowing epsilon (for member) from the Greek character set in Unicode
> is not always what one wants to do when writing membership propositions
> in APL (and borrowing the alternative MEMBER OF symbol may not get you
> what you want either).  It's even more fun if you want to write APL
> programs and use Greek-language identifiers.  Something a CP4E teacher
> in a Greek school might strongly desire to do.  Get it?
>

What kind of sadist would try to teach a CP4E course using APL?!   (And I
say
this fondly as someone who learned APL as my first computer language in
high school.)

Actually, this is a very good point.  But the problem is more with APL than

with Unicode.  Every computer language needs a way to distinguish
characters used in literals from characters used in operators from
characters
used as raw data (e.g., strings), and so on.  Usually the decisions made
during
syntax design will limit to programmer's freedom to make up literal names.
(Or to quote what I said at 90 decibels at 2 a.m. two nights ago: "Why the
****
can't I call that variable 'class' if I want to call it 'class'!)
Iverson's decisions
during APL's syntax design must have made perfect sense in 1960, but in
hindsight they turned out to be extremely inconvenient.  If you want to
extend
APL to make it less inconvenient for your Greek teacher, you're either
going
to have to redesign the language syntax from scratch (as Iverson did for J)
or
you could take the problematic APL characters and assign your own codes
to them from Unicode's private use blocks.  That's what the private use
blocks are for -- they let you and the four other people in the universe
who
are going to use your Greek APL interpreter do what you need to do, without

bogging down the entire character standard.

-- Kevin Russell







More information about the Python-list mailing list