[I18n-sig] Format strings
Josef Spillner
2005 at kuarepoti-dju.net
Wed Nov 30 15:36:21 CET 2005
[I removed the CC:s since we're all subscribed I think.]
El Lunes, 28. Noviembre 2005 12:55, escribió:
> Plain string literals do not have an encoding attached and
> are regarded as plain byte code strings. As a result, they are
> passed through the decoding mechanism by reencoding them after
> first decding them to Unicode (using the source code encoding).
But (my last remaining question, as it seems), the default encoding of
unicode() is "ascii" instead of "utf-8" even for this particular source file
which specifies utf-8 encoding.
Would changing this to match the source file encoding break applications as
well?
Note that the documentation is not really helpful about this aspect. I'd like
to advocate for an i18n paragraph in the tutorial even, where such
behavioural aspects are put into relation with each other, and explained in
the concept of modern (and legacy) runtime environment concepts.
Or it'd be helpful to link to the Unicode HOWTO from the tutorial/module
index. However, both of them contradict slightly, e.g. in the parameter
description to unicode().
Compare:
[All of its arguments should be 8-bit strings]
vs.
[if object is a Unicode string or subclass it will return that Unicode string]
(actually it should say "Unicode object" below, right?)
>> Why all the hassle of using u"..." instead of making it the default?
>This will happen in Python 3.0.
Ah, nice to know.
>> There is a lot of python source code I maintain, and it would simplify
>> coding a lot if this could be made the default.
> Indeed, but it potentially also breaks a lot of code since Python
> and the many extensions for it are not yet fully Unicode compatible.
I just tested -U on my applications. It seems that the 'random' module is a
large offender. Otherwise, it seems to work ok. Some PyGame oddities but
those are actually present without -U as well, and I'm going to look into
fixing the library.
Is anyone coordinating the work, i.e. is there a "unicode compatibility status
map" or anything similar?
Josef
More information about the I18n-sig
mailing list