[Python-Dev] Some thoughts on the codecs...

Andy Robinson andy@robanal.demon.co.uk
Tue, 16 Nov 1999 04:18:19 -0800 (PST)


--- "M.-A. Lemburg" <mal@lemburg.com> wrote:
> So I can drop JIS ? [I won't be able to drop the
> escaped unicode
> codec because this is needed for u"" and ur"".]

Drop Japanese from the core language.  

JIS0208 is a big character set with three popular
encodings (Shift-JIS, EUC-JP and JIS), and a host of
slight variations; it has 6879 characters, and there
are a range of options a user might need to set for it
to be useful.  So let's assume for now this a separate
package.  There's a good chance I'll do it but it is
not a small job.  If you start statically linking in
tables of 7000 characters for one Asian language,
you'll have to do the lot.

As for the single-byte Latin ones, a prototype Python
module could be whipped up in a couple of evenings,
and a tiny C function which does single-byte to
double-byte mappings and vice versa could make it
fast.  We can have an extensible, data driven solution
in no time without having to build it into the core.

The way I see it, to claim that python has i18n, a
serious effort is needed to ensure every major
encoding in the world is available to Python users.  
But that's separate to the core languages.  Your spec
should only cover what is going to be hard-coded into
Python.  

I'd like to see one paragraph in your spec stating
that our architecture seperates the encodings
themselves from the core language changes, and that
getting them sorted is a logically separate (but
important) project.  Ideally, we could put together a
separate proposal for the encoding library itself and
run it by some world class experts in that field, but
after yours is done.


- Andy

 



=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com