[Python-Dev] Internationalization Toolkit

Guido van Rossum guido@CNRI.Reston.VA.US
Thu, 11 Nov 1999 07:03:51 -0500


> Let me tell you why you would want to have an encoding
> which can be set:
> 
> (1) sday I am on a Japanese Windows box, I have a
> string called 'address' and I do 'print address'.  If
> I see utf8, I see garbage.  If I see Shift-JIS, I see
> the correct Japanese address.  At this point in time,
> utf8 is an interchange format but 99% of the world's
> data is in various native encodings.  
> 
> Analogous problems occur on input.
> 
> (2) I'm using htmlgen, which 'prints' objects to
> standard output.  My web site is supposed to be
> encoded in Shift-JIS (or EUC, or Big 5 for Taiwan,
> etc.)  Yes, browsers CAN detect and display UTF8 but
> you just don't find UTF8 sites in the real world - and
> most users just don't know about the encoding menu,
> and will get pissed off if they have to reach for it.
> 
> Ditto for streaming output in some protocol.
> 
> Java solves this (and we could too by hacking stdout)
> using Writer classes which are created as wrappers
> around an output stream and can take an encoding, but
> you lose the flexibility to 'just print'.  
> 
> I think being able to change encoding would be useful.
>  What I do not want is to auto-detect it from the
> operating system when Python boots - that would be a
> portability nightmare. 

You almost convinced me there, but I think this can still be done
without changing the default encoding: simply reopen stdout with a
different encoding.  This is how Java does it.  I/O streams with an
encoding specified at open() are a very powerful feature.  You can
hide this in your $PYTHONSTARTUP.

François Pinard might not like it though...

BTW, someone asked what HP asked for: I can't reveal what exactly they
asked for, basically because they don't seem to agree amongst
themselves.  The only firm statements I have is that they want i18n
and that they want it fast (before the end of the year).

The desire from Perl-compatible regexps comes from me, and the only
reason is compatibility with re.py.  (HP did ask for regexps, but they
don't know the difference between POSIX and Perl if it poked them in
the eye.)

--Guido van Rossum (home page: http://www.python.org/~guido/)