[I18n-sig] RE: [Python-Dev] Pre-PEP: Python Character Model

M.-A. Lemburg mal@lemburg.com
Mon, 12 Feb 2001 11:39:13 +0100


Tim Peters wrote:
> 
> [Neil Hodgson]
> >    Matz: "We don't believe there can be any single characer-
> > encoding that encompasses all the world's languages.  We want
> > to handle multiple encodings at the same time (if you want to).
> 
> [/F]
> > neither does the unicode designers, of course: the point
> > is that unicode only deals with glyphs, not languages.
> >
> > most existing japanese encodings also include language info,
> > and if you don't understand the difference, it's easy to think
> > that unicode sucks...
> 
> It would be helpful to read Matz's quote in context:
> 
>     http://www.deja.com/getdoc.xp?AN=705520466&fmt=text
> 
> The "encompasses all the world's languages" business was taken verbatim from
> the question to which he was replying.  His concerns for Unicoded Japanese
> are about time efficiency for conversions from ubiquitous national
> encodings; relative (lack of) space efficiency for UTF-8 storage of Unicoded
> Japanese (unclear why he's hung up on UTF-8, though -- but it's an ongoing
> theme in c.l.ruby); and that Unicode (including surrogates) is too small and
> too late for parts of his market:
> 
>     I was thinking of applications that process big character
>     set (e.g. Mojikyo set) which is not covered by Unicode.  I
>     don't know exactly how many code points it has.  But I've
>     heard it's pretty big, possibly consumes half of surrogate
>     space.  And they want to process them now.  I think they
>     don't want to wait Unicode consortium to assign code points
>     for their characters.
> 
> The first hit I found on Mojikyo was for a freely downloadable "Mojikyo Font
> Set", containing about 50,000 Chinese glyphs beyond those covered by
> Unicode, + about 20,000 more from other Asian languages.  Python better move
> fast lest it lose the Oracle Bone market to Ruby <wink>.
> 
> a-2-byte-encoding-space-was-too-small-the-day-unicode-was-conceived-
>     and-20-bits-won't-last-either-ly y'rs  - tim

Has anyone ever considered the problems this causes for type
designers ? Who is going to do the job of designing 2^20 character 
glyphs to all match the same font design guidelines ? Perhaps
I'm missing something here, but this sounds like Just is going 
to have a bright future ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/