[ python-Bugs-1193061 ] Python and Turkish Locale

SourceForge.net noreply at sourceforge.net
Tue Oct 11 23:36:55 CEST 2005


Bugs item #1193061, was opened at 2005-04-30 17:37
Message generated for change (Comment added) made by exa
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: S.Çağlar Onur (caglar)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Python and Turkish Locale

Initial Comment:
On behalf of this thread;

http://mail.python.org/pipermail/python-dev/2005-April/052968.html

As described in
http://www.i18nguy.com/unicode/turkish-i18n.html [ How
Applications Fail With Turkish Language
] , Turkish has 4 "i" in their alphabet. 

Without --with-wctype-functions support Python convert
these characters locare-independent manner in
tr_TR.UTF-8 locale. So all conversitons maps to "i" or
"I" which is wrong in Turkish locale. 

So if Python Developers will remove the wctype
functions from Python, then there must be a
locale-dependent upper/lower funtion to handle these
characters properly.


----------------------------------------------------------------------

Comment By: Eray Ozkural (exa)
Date: 2005-10-11 21:36

Message:
Logged In: YES 
user_id=1454

The better solution is to use an optional locale argument for 
upper/lower functions and other language-dependent text 
processing functions. 
 

----------------------------------------------------------------------

Comment By: S.Çağlar Onur (caglar)
Date: 2005-05-02 08:45

Message:
Logged In: YES 
user_id=858447

No, im not. These rules defined in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt and
http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt.
Note that there is a comments says;

# T: special case for uppercase I and dotted uppercase I
#    - For non-Turkic languages, this mapping is normally
not used.
#    - For Turkic languages (tr, az), this mapping can be
used instead of the normal mapping for these characters.
#      Note that the Turkic mappings do not maintain
canonical equivalence without additional processing.
#      See the discussions of case mapping in the Unicode
Standard for more information.

So without wctype functions support, python can't convert
these. This _is_ the problem. As a side effect of this,
another huge problem occurs, keywords can't be locale
dependent. If Python compiled with wctype support functions,
all "i".upper() turns into "İ" which is wrong for keyword
comparision ( like quit v.s QUİT )

So i suggest implement two new functions like
localeAwareLower()/localeAwareUpper() for python and let
lower()/upper() locale independent. And as you wrote locale
module may be a perfect home for these :)



----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-05-02 08:00

Message:
Logged In: YES 
user_id=38388

I'm not sure I understand: are you saying that the Unicode
mappings for upper and lower case are wrong in the standard ?

Note that removing the wctype functions will only remove the
possibility to use these functions for case mapping of
Unicode characters instead of using the builtin Unicode
character database. This was originally meant as
optimization to avoid having to load the Unicode database -
nowadays the database is always included, so the
optimization is no longer needed. Even worse: the wctype
functions sometimes behave differently than the mappings in
the Unicode database (due to differences in the Unicode
database version or implementation s).

Now, since the string .lower() and .upper() methods are
locale dependent (due to their reliance on the C functions
toupper() and tolower() - not by intent), while the Unicode
versions are not, we have a rather annoying situation where
switching from strings to Unicode cause semantic differences.

Ideally, both string and Unicode methods should do case
mapping in an locale independent way. The support for
differences in locale dependent case mapping, collation,
etc. should be moved to an external module, e.g. the locale
module.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1193061&group_id=5470


More information about the Python-bugs-list mailing list