[issue20049] string.lowercase and string.uppercase can contain garbage
Alexander Pyhalov
report at bugs.python.org
Sat Dec 21 22:38:37 CET 2013
New submission from Alexander Pyhalov:
When Python 2.6 (or 2.7) compiled with _XOPEN_SOURCE=600 on illumos string.lowercase and string.uppercase contain garbage when UTF-8 locale is used.
(OpenIndiana bug report - https://www.illumos.org/issues/4411 ).
The reason is that with UTF-8 locale islower()/isupper() and similar functions are not expected to work with non-ascii symbols.
So, code like
n = 0;
for (c = 0; c < 256; c++) {
if (islower(c))
buf[n++] = c;
}
is expected to fail, because it calls islower on illegal UTF-8 symbols (with codes 128-255). It should be converted to something like
n = 0;
for (c = 0; c < 256; c++) {
if (isascii(c) && islower(c))
buf[n++] = c;
}
or to
n = 0;
for (c = 0; c < 128; c++) {
if (islower(c))
buf[n++] = c;
}
Before doing this you should check if locale is UTF-8. However, almost all non-C locales on illumos are UTF-8.
Example of incorrect behavior:
Python 2.6.9 (unknown, Nov 12 2013, 13:54:48)
[GCC 4.7.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\\xaa\\xb5\\xba\\xdf\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff'
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ\\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde'
>>>
----------
components: Unicode
messages: 206786
nosy: Alexander.Pyhalov, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: string.lowercase and string.uppercase can contain garbage
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20049>
_______________________________________
More information about the Python-bugs-list
mailing list