Tkinter + Unicode -- possible FIX

Wed Jan 29 04:26:24 EST 2003

In <3e35a3db_1 at dns.sd54.bc.ca>, Bob van der Poel wrote:
> Martin v. Loewis wrote:
> > Bob van der Poel wrote:
> > 
> >> And, I'm sort of
> >> convinced that having encoding enabled is the proper way to do things.
> > 
> > 
> > I'm convinced that this is evil. Your application will run on some 
> > systems and crash on others.
> >
> Well, I'm beginning to believe you are right. So, guess it's off to the 
> salt mines and I'll have to find all my widget.get()s and fix them.

And such finding and fixing will be almost impracticable, won't they?

I guess there are two reasons why having encoding enabled is evil.

  1. Such applications will not run in the default ASCII setting.

  2. Some systems have default encodings (that locale.getdefaultlocale()
     returns) for which Python does not provide the codecs.

As for Reason 1, if users want to use their own national characters in
your application, they will use _their_ encoding.  It may not be ASCII,
nor the encoding you suppose (but might be UTF-8 etc).  Do NOT GUESS
which encoding your users use (and, of course, do not guess they use
ASCII, which is a _special_ encoding where all bytes are in 0x00-0x7F).
They will specify their encoding as they like it.

Practically speaking, it is handy for users if the encoding
specification appears somewhere _one_ trivial place (e.g. site.py).  It
is worse if such specification appears all over the application code:

    if type(result) != type(""):
        result = result.encode(encoding)

where it is not clear if all the 'encoding's are consistent.

You had better check if your application runs, at least, both in ASCII
encoding and your current encoding.  Ideally you should make your
application runnable in _every_ encoding compatible with ASCII.  But do
not scatter encoding specifications over the code; it is evil.

Note that encoding-enabling we discuss here does not affect the parser
of Python.  Literals in the application code have _safely_ the same
values as the default.

Reason 2 applies to locales of China, Japan etc., you know.  In such
case, both implicit and _explicit_ conversions between Unicode and
string will raise LookupError of codecs _anyway_ if users specify the
encoding of their own locale.

# Prohibit them from using encoding of their own locale?  Indeed the 
# application will run, but it might be _useless_ for them.
# At least they _need_ encodings in which they _can_ represent their
# characters---typically, either UTF-8 or the encoding of their locale.

Luckily there is a general solution.  For Windows, just put

 if sys.platform == 'win32':
     import locale, codecs
     enc = locale.getdefaultlocale()[1]
     if enc.startswith('cp'):            # "cp***" ?
         try:
             codecs.lookup(enc)
         except LookupError:
             import encodings
             encodings._cache[enc] = encodings._unknown
             encodings.aliases.aliases[enc] = 'mbcs'

into site.py.  You do not need any other codecs from the third party.  I
have tested this patch in Python 2.3a1, but should work in Python 2.2.*
as well.  (In fact, the patch is mandatory for Python 2.3a1 IDLE on
Windows in Japanese locale if you do not install additional codecs.)

For Unix, I heard Universal Unicode Codec for POSIX iconv by Hye-Shik
Chang has been adopted in Python 2.3a recently.  Any codecs for system
default encodings will be available trivially.

# And as the bottom line, they can use UTF-8 at least.

Thus you just carefully make your application runnable in many encodings
compatible with ASCII, and do not scatter encoding specifications over
the code.  From my experience, I'd think it should be a trivial
requirement for typical applications.  Python users over the world will
be happy with your code.

After all, I think you do not have to go to the salt mines.

-- SUZUKI Hisao