[I18n-sig] RE: [Python-Dev] Pre-PEP: Python Character Model

Mon, 12 Feb 2001 03:10:01 -0500

[Neil Hodgson]
>    Matz: "We don't believe there can be any single characer-
> encoding that encompasses all the world's languages.  We want
> to handle multiple encodings at the same time (if you want to).

[/F]
> neither does the unicode designers, of course: the point
> is that unicode only deals with glyphs, not languages.
>
> most existing japanese encodings also include language info,
> and if you don't understand the difference, it's easy to think
> that unicode sucks...

It would be helpful to read Matz's quote in context:

    http://www.deja.com/getdoc.xp?AN=705520466&fmt=text

The "encompasses all the world's languages" business was taken verbatim from
the question to which he was replying.  His concerns for Unicoded Japanese
are about time efficiency for conversions from ubiquitous national
encodings; relative (lack of) space efficiency for UTF-8 storage of Unicoded
Japanese (unclear why he's hung up on UTF-8, though -- but it's an ongoing
theme in c.l.ruby); and that Unicode (including surrogates) is too small and
too late for parts of his market:

    I was thinking of applications that process big character
    set (e.g. Mojikyo set) which is not covered by Unicode.  I
    don't know exactly how many code points it has.  But I've
    heard it's pretty big, possibly consumes half of surrogate
    space.  And they want to process them now.  I think they
    don't want to wait Unicode consortium to assign code points
    for their characters.

The first hit I found on Mojikyo was for a freely downloadable "Mojikyo Font
Set", containing about 50,000 Chinese glyphs beyond those covered by
Unicode, + about 20,000 more from other Asian languages.  Python better move
fast lest it lose the Oracle Bone market to Ruby <wink>.

a-2-byte-encoding-space-was-too-small-the-day-unicode-was-conceived-
    and-20-bits-won't-last-either-ly y'rs  - tim