String methods understanding anything but ASCII?

"Martin v. Löwis" martin at v.loewis.de
Sun Jan 19 17:42:20 EST 2003


Magnus Lie Hetland wrote:
> I just wondered -- is there hope that string methods such as upper()
> or capitalize() will ever understand anything other than ascii? 

They already do that, after you invoke locale.setlocale:

 >>> import locale
 >>> locale.setlocale(locale.LC_ALL,"")
'German_Germany.1252'
 >>> print "ö".upper()
Ö

 > How about, e.g. iso8859-1 (which does seem to be
> the default encoding)? 

It's not. Python doesn't really have a "default encoding"; the 
traditional string type really represents byte strings, not character 
strings. For purposes of character classification, its encoding is the 
one that the platforms "C" locale uses, i.e. typically ASCII; this 
changes with the current locale. For purposes of conversion to Unicode, 
the "system default encoding" is also "ascii", unless overridden by the 
administrator.

> So -- is there any hope of this? Or are there any convenient ways of
> dealing with it, or perhaps a library somewhere?

As Irmen explains, you really should use Unicode strings for that - they 
support uppercasing for all languages of the world, simultaneously.

> Opinions? Does this warrant a PEP? Are there applications and/or
> problems I haven't thought of?

Please, no. Everything that needs to be there already is, and yes, there 
are problems you haven't thought of.

Regards,
Martin






More information about the Python-list mailing list