[I18n-sig] Format strings

"Martin v. Löwis" martin at v.loewis.de
Wed Nov 30 23:52:50 CET 2005


Josef Spillner wrote:
> But (my last remaining question, as it seems), the default encoding of 
> unicode() is "ascii" instead of "utf-8" even for this particular source file 
> which specifies utf-8 encoding.
> Would changing this to match the source file encoding break applications as 
> well?

No. *That* would not be implementable (or, if somehow implemented, would
break applications). In general, if you convert a Unicode string into
a byte string, you cannot even be sure it originally came from source 
code. Say you do

a = u"Martin "
b = u"v. "
c = u"Löwis"
mvl = a+b+c

Now, the object mvl does not have any source code: so which encoding 
should be used to encode it? If you have an answer: how does that change
if I have

mvl = mod1.a+mod2.b+mod3.c

> Note that the documentation is not really helpful about this aspect. I'd like 
> to advocate for an i18n paragraph in the tutorial even, where such 
> behavioural aspects are put into relation with each other, and explained in 
> the concept of modern (and legacy) runtime environment concepts.

Contributions to the documentation is welcome.

> Compare:
> [All of its arguments should be 8-bit strings]
> vs.
> [if object is a Unicode string or subclass it will return that Unicode string]
> (actually it should say "Unicode object" below, right?)

I personally use "Unicode string" (type unicode) vs. "byte string" (type
str). Both are strings.

> Is anyone coordinating the work, i.e. is there a "unicode compatibility status 
> map" or anything similar?

No. It is so far from actually working that nobody bothers to fix it. 
However, if you have specific contributions which improve the state
(i.e. have no behaviour change if -U is not specified, but fix a bug
  when it is), those are appreciated.

Regards,
Martin


More information about the I18n-sig mailing list