[ python-Bugs-1634774 ] locale 1251 does not convert to upper case properly
SourceForge.net
noreply at sourceforge.net
Thu Jan 18 22:59:41 CET 2007
Bugs item #1634774, was opened at 2007-01-13 18:30
Message generated for change (Comment added) made by dobrokot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634774&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ivan Dobrokotov (dobrokot)
Assigned to: Nobody/Anonymous (nobody)
Summary: locale 1251 does not convert to upper case properly
Initial Comment:
<pre>
# -*- coding: 1251 -*-
import locale
locale.setlocale(locale.LC_ALL, ".1251") #locale name may be Windows specific?
#-----------------------------------------------
print chr(184), chr(168)
assert chr(255).upper() == chr(223) #OK
assert chr(184).upper() == chr(168) #fail
#-----------------------------------------------
assert 'q'.upper() == 'Q' #OK
assert 'ж'.upper() == 'Ж' #OK
assert 'Ж'.upper() == 'Ж' #OK
assert 'я'.upper() == 'Я' #OK
assert u'ё'.upper() == u'Ё' #OK (locale independent)
assert 'ё'.upper() == 'Ё' #fail
</pre>
I suppose incorrect realization of uppercase like
<pre>
if ('a' <= c && c <= 'я')
return c+'Я'-'я'
</pre>
symbol 'ё' (184 in cp1251) is not in range 'a'-'я'
----------------------------------------------------------------------
>Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-18 22:59
Message:
Logged In: YES
user_id=1538986
Originator: YES
----------------------------------------------
standard header ctype.h:
#define _toupper(_c) ( (_c)-'a'+'A' )
----------------------------------------------
CRT file toupper.c:
/* define function-like macro equivalent to _toupper()
*/
#define mkupper(c) ( (c)-'a'+'A' )
int __cdecl _toupper (
int c
)
{
return(mkupper(c));
}
( http://www.everfall.com/paste/id.php?j13ernl40i9e )
suggestion: replace _toupper with toupper. Performance may degrade ( a lot
thread locks/MultiByteToWideChar/other code for every non-ASCII lowercase
symbol). Sugestion for optimization: setup "int toupper_table[256]" (and
other tables) in everycall to setlocale.
----------------------------------------------------------------------
Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-18 22:18
Message:
Logged In: YES
user_id=1538986
Originator: YES
well, C:
----------------------------
#include <locale.h>
#include <stdio.h>
#include <assert.h>
int main()
{
int i = 184;
char *old = setlocale(LC_CTYPE, ".1251");
assert(old);
printf("%d -> %d\n", i, _toupper(i));
printf("%d -> %d\n", i, toupper(i));
}
----------------------------
C ouput:
184 -> 152
184 -> 168
so, _toupper and upper are different functions. MSDN does not mention
nothing about difference, except that 'toupper' is "ANSI compatible" :(
File Added: toupper.zip
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2007-01-18 21:08
Message:
Logged In: YES
user_id=21627
Originator: NO
You can see the implementation of .upper in
http://svn.python.org/projects/python/tags/r25/Objects/stringobject.c
(function string_upper)
Off-hand, I cannot see anything wrong in that code. It definitely does
*not* use c+'Я'-'я'.
----------------------------------------------------------------------
Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 22:08
Message:
Logged In: YES
user_id=1538986
Originator: YES
forgot to mention used python version -
http://www.python.org/ftp/python/2.5/python-2.5.msi
----------------------------------------------------------------------
Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 18:51
Message:
Logged In: YES
user_id=1538986
Originator: YES
sorry, I mean
toupper((int)(unsigned char)'ё')
not just toupper('ё')
----------------------------------------------------------------------
Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 18:49
Message:
Logged In: YES
user_id=1538986
Originator: YES
C-CRT library fucntion toupper('ё') works properly, if I set
setlocale(LC_ALL, ".1251")
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634774&group_id=5470
More information about the Python-bugs-list
mailing list