[I18n-sig] RE: [Python-Dev] Pre-PEP: Python Character Model
M.-A. Lemburg
mal@lemburg.com
Mon, 12 Feb 2001 11:39:13 +0100
Tim Peters wrote:
>
> [Neil Hodgson]
> > Matz: "We don't believe there can be any single characer-
> > encoding that encompasses all the world's languages. We want
> > to handle multiple encodings at the same time (if you want to).
>
> [/F]
> > neither does the unicode designers, of course: the point
> > is that unicode only deals with glyphs, not languages.
> >
> > most existing japanese encodings also include language info,
> > and if you don't understand the difference, it's easy to think
> > that unicode sucks...
>
> It would be helpful to read Matz's quote in context:
>
> http://www.deja.com/getdoc.xp?AN=705520466&fmt=text
>
> The "encompasses all the world's languages" business was taken verbatim from
> the question to which he was replying. His concerns for Unicoded Japanese
> are about time efficiency for conversions from ubiquitous national
> encodings; relative (lack of) space efficiency for UTF-8 storage of Unicoded
> Japanese (unclear why he's hung up on UTF-8, though -- but it's an ongoing
> theme in c.l.ruby); and that Unicode (including surrogates) is too small and
> too late for parts of his market:
>
> I was thinking of applications that process big character
> set (e.g. Mojikyo set) which is not covered by Unicode. I
> don't know exactly how many code points it has. But I've
> heard it's pretty big, possibly consumes half of surrogate
> space. And they want to process them now. I think they
> don't want to wait Unicode consortium to assign code points
> for their characters.
>
> The first hit I found on Mojikyo was for a freely downloadable "Mojikyo Font
> Set", containing about 50,000 Chinese glyphs beyond those covered by
> Unicode, + about 20,000 more from other Asian languages. Python better move
> fast lest it lose the Oracle Bone market to Ruby <wink>.
>
> a-2-byte-encoding-space-was-too-small-the-day-unicode-was-conceived-
> and-20-bits-won't-last-either-ly y'rs - tim
Has anyone ever considered the problems this causes for type
designers ? Who is going to do the job of designing 2^20 character
glyphs to all match the same font design guidelines ? Perhaps
I'm missing something here, but this sounds like Just is going
to have a bright future ;-)
--
Marc-Andre Lemburg
______________________________________________________________________
Company: http://www.egenix.com/
Consulting: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/