[Python-Dev] Internationalization Toolkit
M.-A. Lemburg
mal@lemburg.com
Thu, 11 Nov 1999 15:47:49 +0100
Guido van Rossum wrote:
>
> > Let me tell you why you would want to have an encoding
> > which can be set:
> >
> > (1) sday I am on a Japanese Windows box, I have a
> > string called 'address' and I do 'print address'. If
> > I see utf8, I see garbage. If I see Shift-JIS, I see
> > the correct Japanese address. At this point in time,
> > utf8 is an interchange format but 99% of the world's
> > data is in various native encodings.
> >
> > Analogous problems occur on input.
> >
> > (2) I'm using htmlgen, which 'prints' objects to
> > standard output. My web site is supposed to be
> > encoded in Shift-JIS (or EUC, or Big 5 for Taiwan,
> > etc.) Yes, browsers CAN detect and display UTF8 but
> > you just don't find UTF8 sites in the real world - and
> > most users just don't know about the encoding menu,
> > and will get pissed off if they have to reach for it.
> >
> > Ditto for streaming output in some protocol.
> >
> > Java solves this (and we could too by hacking stdout)
> > using Writer classes which are created as wrappers
> > around an output stream and can take an encoding, but
> > you lose the flexibility to 'just print'.
> >
> > I think being able to change encoding would be useful.
> > What I do not want is to auto-detect it from the
> > operating system when Python boots - that would be a
> > portability nightmare.
>
> You almost convinced me there, but I think this can still be done
> without changing the default encoding: simply reopen stdout with a
> different encoding. This is how Java does it. I/O streams with an
> encoding specified at open() are a very powerful feature. You can
> hide this in your $PYTHONSTARTUP.
True and it probably covers all cases where setting the
default encoding to something other than UTF-8 makes sense.
I guess you've convinced me there ;-)
The current proposal has wrappers around stream for this purpose:
For explicit handling of Unicode using files, the unicodec module
could provide stream wrappers which provide transparent
encoding/decoding for any open stream (file-like object):
import unicodec
file = open('mytext.txt','rb')
ufile = unicodec.stream(file,'utf-16')
u = ufile.read()
...
ufile.close()
XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as
short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which
also assures that <mode> contains the 'b' character when needed.
The above can be done using:
import sys,unicodec
sys.stdin = unicodec.stream(sys.stdin,'jis')
sys.stdout = unicodec.stream(sys.stdout,'jis')
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 50 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/