[EuroPython] i18n and realted issues

Magnus Lyckå magnus@thinkware.se
Wed, 30 Apr 2003 00:22:35 +0200


At 18:15 2003-04-29 +0200, M.-A. Lemburg wrote:
>We can discuss many things, but would that make any difference ?
>Getting this right is a lot of work and I don't see any funding
>or interest from volunteers to get any of it done.

I'm curious about how important people think this is.
Does it have any significant impact on the viability of
Python in business? It surprises me if French, German,
Lithuanian and other European python programmers don't
care about this. But if noone else cares I'll shut up
until this really starts to cause me pain.

I think i18n issues are more important for adminstrative
systems aimed at end users, particularly in large companies
where the same installation might cater to people working
in different languages etc, than in the areas of scientists,
programmers etc, where python is playing a bigger role today.

>>  >>> locale.getdefaultlocale()
>>('sv_SE', 'cp1252')
>>  >>> locale.setlocale(locale.LC_ALL, '')
>>'Swedish_Sweden.1252'
>>  >>> locale.getlocale()
>>['Swedish_Sweden', '1252']
>>You see, they aren't the same! This leads to:
>
>You're mixing character sets with locales here.

Read again. getdefaultlocale() and setlocale(LC_ALL, '')
followed by getlocale() give different results. on Win
2000 and XP. The locale returned by getdefaultlocale is
not recognized by the system, which causes resetlocale()
to barf. Do you mean that a plain call to resetlocale()
causes an exception on Windows 2000 and XP systems with
vanilla Swedish settings because I misunderstood something? :)

Is this a purely Swedish problem or are we "in good company"?
It's easy to test:

 >>> import locale
 >>> locale.resetlocale()

Does this work anywhere outside the U.S. of A?

Surely the the second item in the locale tuple (which
*should* be 'cp1252' on my system) is a character set!
But what I get from setlocale(LC_ALL, ''); getlocale()
is just '1252'.

I'm assuming it's our friends in Redmond who caused this
confusion. After all, if they had been good at following
standards, it would say ['sv_SE', 'ISO8859-1'] instead,
but this *is* after all one of the most important platforms
today. I can't wait for Linux world domination... :)

>That's probably because your Windows version doesn't support the
>German locale. No surprise here :-) It should work on Linux which
>usually comes with all sorts of locale information.

Sure. Try it on a German Windows 2000 or XP system... At least here,
I get...

 >>> locale.setlocale(locale.LC_ALL, 'sv')
Error: locale setting not supported

...and it *is* a Swedish system. (Don't try 'sw', that's
Swahili. I did a little mistake setting the menu language
for my DVD-player, and now I will always remember. :)

>The codec registry only knows about "cp1252" because that's
>the standard name. We can't go about and add all possible
>aliases for each and every encoding out there.

Yes, I know. So why does setlocale(LC_ALL, ''); getlocale()[1]
return '1252' and not 'cp1252' which is what the dysfunctional
default locale says?

>>How do we display dates and times according to locale? Does the new
>>date module handle that? locale.atof and locale.format can at least
>>display floats right. (I think.)
>>To the extent that the code is there, it's just briefly described in the
>>docs, and very little in all the Python books out there. Is this really
>>such a peripheral issue?
>
>Probably not too interesting to the US folks :-) Everybody
>else seems to be using their own little tool sets for this.

But this isn't Perl, it's Python. :) We are supposed to have *one*
way to do things! I think standardizing things like this properly
will be important to make is easier to develop and share business
code. I certainly think this is the kind of thing where one way
of doing it would be helpful. We want support for many locales,
not many ways to support a few locales!

It seems to me than when a new package or module is added to the
standard library, it will mean that most people will use that, and
there will be less of divided efforts and confusing code. Sometimes
it's useful with different versions of things, because they have
different goals and priorities, but often there is a pluralism just
because of ignorance and lack of coordination. Oops, was there such
a module already? Is there a standard for this?

>Using the collation support defined in the Unicode
>standard (provided that someone writes the support code
>needed for the Python implementation).

Great. There *is* a standard for this. Thanks for enlightening me.


--
Magnus Lycka (It's really Lyckå), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program