[I18n-sig] Format strings

Josef Spillner 2005 at kuarepoti-dju.net
Wed Nov 30 15:36:21 CET 2005


[I removed the CC:s since we're all subscribed I think.]

El Lunes, 28. Noviembre 2005 12:55, escribió:
> Plain string literals do not have an encoding attached and
> are regarded as plain byte code strings. As a result, they are
> passed through the decoding mechanism by reencoding them after
> first decding them to Unicode (using the source code encoding).

But (my last remaining question, as it seems), the default encoding of 
unicode() is "ascii" instead of "utf-8" even for this particular source file 
which specifies utf-8 encoding.
Would changing this to match the source file encoding break applications as 
well?

Note that the documentation is not really helpful about this aspect. I'd like 
to advocate for an i18n paragraph in the tutorial even, where such 
behavioural aspects are put into relation with each other, and explained in 
the concept of modern (and legacy) runtime environment concepts.

Or it'd be helpful to link to the Unicode HOWTO from the tutorial/module 
index. However, both of them contradict slightly, e.g. in the parameter 
description to unicode().

Compare:
[All of its arguments should be 8-bit strings]
vs.
[if object is a Unicode string or subclass it will return that Unicode string]
(actually it should say "Unicode object" below, right?)

>> Why all the hassle of using u"..." instead of making it the default?
>This will happen in Python 3.0.

Ah, nice to know.

>> There is a lot of python source code I maintain, and it would simplify
>> coding a lot if this could be made the default.

> Indeed, but it potentially also breaks a lot of code since Python
> and the many extensions for it are not yet fully Unicode compatible.

I just tested -U on my applications. It seems that the 'random' module is a 
large offender. Otherwise, it seems to work ok. Some PyGame oddities but 
those are actually present without -U as well, and I'm going to look into 
fixing the library.

Is anyone coordinating the work, i.e. is there a "unicode compatibility status 
map" or anything similar?

Josef


More information about the I18n-sig mailing list