[Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons)

Peter Funk pf@artcom-gmbh.de
Wed, 5 Apr 2000 17:54:12 +0200 (MEST)


Guido van Rossum:
> > u"..." currently interprets the characters it finds as Latin-1
> > (this is by design, since the first 256 Unicode ordinals map to
> > the Latin-1 characters).
> 
> Nice, except that now we seem to be ambiguous about the source
> character encoding: it's Latin-1 for Unicode strings and UTF-8 for
> 8-bit strings...!

This is a little bit difficult to understand and will make the task
to write the upcoming 1.6 documentation even more challenging. ;-)
But I agree:  Changing this should go into 1.7

BTW: Our umlaut strings are sooner or later passed through one 
central function.  All modules usually contain something like this:

try:
    import fintl
    _ = fintl.gettext
execpt ImportError:
    def _(msg): return msg

...
    MenuEntry(_("Öffnen"), self.open),
    MenuEntry(_("Schließen"), self.close)
    ....
you get the picture.

It would be easy to change the implementation of 'fintl.gettext' to 
coerce the resulting strings into Unicode or do whatever is required.  
But we currently use GNU gettext to produce the messages files that are 
translated into english, french and italian.  AFAIK GNU gettext handles 
only 8 bit strings anyway.  Our customers in far east currently live 
with the english version but this has merely financial than technical 
reasons.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)