[ python-Bugs-1634774 ] locale 1251 does not convert to upper case properly

SourceForge.net noreply at sourceforge.net
Thu Jan 18 22:59:41 CET 2007


Bugs item #1634774, was opened at 2007-01-13 18:30
Message generated for change (Comment added) made by dobrokot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634774&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ivan Dobrokotov (dobrokot)
Assigned to: Nobody/Anonymous (nobody)
Summary: locale 1251 does not convert to upper case properly

Initial Comment:
<pre>
 # -*- coding: 1251 -*-

import locale

locale.setlocale(locale.LC_ALL, ".1251") #locale name may be Windows specific?

#-----------------------------------------------
print chr(184), chr(168)
assert  chr(255).upper() == chr(223) #OK
assert  chr(184).upper() == chr(168) #fail
#-----------------------------------------------
assert  'q'.upper() == 'Q' #OK 
assert  'ж'.upper() == 'Ж' #OK
assert  'Ж'.upper() == 'Ж' #OK
assert  'я'.upper() == 'Я' #OK
assert  u'ё'.upper() == u'Ё' #OK (locale independent)
assert  'ё'.upper() == 'Ё' #fail
</pre>

I suppose incorrect realization of uppercase like 

<pre>
if ('a' <= c && c <= 'я')
  return c+'Я'-'я'
</pre>

symbol 'ё' (184 in cp1251) is not in range 'a'-'я'

----------------------------------------------------------------------

>Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-18 22:59

Message:
Logged In: YES 
user_id=1538986
Originator: YES


----------------------------------------------
standard header ctype.h:

#define _toupper(_c)    ( (_c)-'a'+'A' )


----------------------------------------------
CRT file toupper.c:



/* define function-like macro equivalent to _toupper()
 */
#define mkupper(c)  ( (c)-'a'+'A' )



int __cdecl _toupper (
        int c
        )
{
        return(mkupper(c));
}

( http://www.everfall.com/paste/id.php?j13ernl40i9e )

suggestion: replace _toupper with toupper. Performance may degrade ( a lot
thread locks/MultiByteToWideChar/other code for every non-ASCII lowercase
symbol). Sugestion for optimization: setup "int toupper_table[256]"  (and
other tables) in everycall to setlocale.




----------------------------------------------------------------------

Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-18 22:18

Message:
Logged In: YES 
user_id=1538986
Originator: YES

well, C:
----------------------------

#include <locale.h>
#include <stdio.h>
#include <assert.h>

int main()
{
  int i = 184;
  char *old = setlocale(LC_CTYPE, ".1251");
  assert(old);
  printf("%d -> %d\n", i, _toupper(i));   
  printf("%d -> %d\n", i, toupper(i));   
}

----------------------------
C ouput: 
184 -> 152
184 -> 168

so, _toupper and upper are different functions. MSDN does not mention
nothing about difference, except that 'toupper' is "ANSI compatible" :(



File Added: toupper.zip

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-01-18 21:08

Message:
Logged In: YES 
user_id=21627
Originator: NO

You can see the implementation of .upper in

http://svn.python.org/projects/python/tags/r25/Objects/stringobject.c
(function string_upper)

Off-hand, I cannot see anything wrong in that code. It definitely does
*not* use c+'Я'-'я'.

----------------------------------------------------------------------

Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 22:08

Message:
Logged In: YES 
user_id=1538986
Originator: YES

forgot to mention used python version -
http://www.python.org/ftp/python/2.5/python-2.5.msi

----------------------------------------------------------------------

Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 18:51

Message:
Logged In: YES 
user_id=1538986
Originator: YES

sorry, I mean 
toupper((int)(unsigned char)'ё') 
not just  toupper('ё') 

----------------------------------------------------------------------

Comment By: Ivan Dobrokotov (dobrokot)
Date: 2007-01-13 18:49

Message:
Logged In: YES 
user_id=1538986
Originator: YES

C-CRT library fucntion toupper('ё') works properly, if I set
setlocale(LC_ALL, ".1251")

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1634774&group_id=5470


More information about the Python-bugs-list mailing list