[ python-Bugs-1193061 ] Python and Turkish Locale

Mon May 2 10:00:58 CEST 2005

Bugs item #1193061, was opened at 2005-04-30 19:37
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470

Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: S.Ça&#287;lar Onur (caglar)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Python and Turkish Locale

Initial Comment:
On behalf of this thread;

http://mail.python.org/pipermail/python-dev/2005-April/052968.html

As described in
http://www.i18nguy.com/unicode/turkish-i18n.html [ How
Applications Fail With Turkish Language
] , Turkish has 4 "i" in their alphabet. 

Without --with-wctype-functions support Python convert
these characters locare-independent manner in
tr_TR.UTF-8 locale. So all conversitons maps to "i" or
"I" which is wrong in Turkish locale. 

So if Python Developers will remove the wctype
functions from Python, then there must be a
locale-dependent upper/lower funtion to handle these
characters properly.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2005-05-02 10:00

Message:
Logged In: YES 
user_id=38388

I'm not sure I understand: are you saying that the Unicode
mappings for upper and lower case are wrong in the standard ?

Note that removing the wctype functions will only remove the
possibility to use these functions for case mapping of
Unicode characters instead of using the builtin Unicode
character database. This was originally meant as
optimization to avoid having to load the Unicode database -
nowadays the database is always included, so the
optimization is no longer needed. Even worse: the wctype
functions sometimes behave differently than the mappings in
the Unicode database (due to differences in the Unicode
database version or implementation s).

Now, since the string .lower() and .upper() methods are
locale dependent (due to their reliance on the C functions
toupper() and tolower() - not by intent), while the Unicode
versions are not, we have a rather annoying situation where
switching from strings to Unicode cause semantic differences.

Ideally, both string and Unicode methods should do case
mapping in an locale independent way. The support for
differences in locale dependent case mapping, collation,
etc. should be moved to an external module, e.g. the locale
module.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470