[issue9335] LC_CTYPE system setting not respected by setlocale()
Alexander Belopolsky
report at bugs.python.org
Fri Jul 23 16:13:28 CEST 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
On Fri, Jul 23, 2010 at 4:03 AM, Martin v. Löwis <report at bugs.python.org> wrote:
..
> I fail to see the bug in this report. '\xff' is a letter because the C library says it is.
This does not explain the difference between 2.6 and 2.7. With
attached issue9335-test.py,
$ cat issue9335-test.py
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print(chr(255).isalpha())
$ python2.7 issue9335-test.py
False
$ python2.6 issue9335-test.py
True
$ python2.5 issue9335-test.py
True
Since chr(255) = '\xff', is not a valid UTF-8 byte sequence, it makes
little sense to ask whether it is a letter or not in a locale that
uses UTF-8 encoding. Nevertheless the behavior changed between
revisions and it is not mentioned in "what's new in 2.7". (I suspect
this was introduced in issue5793 (r72040), but I have not verified.)
There are two possible action items here:
1. New behavior needs to be documented. I believe 2.7 is correct
because when isalpha is used to sanitize untrusted input, it is better
to reject in the case of uncertainy.
2. Arguably, this is a security issue and thus eligible for backporting to 2.6.
----------
Added file: http://bugs.python.org/file18146/issue9335-test.py
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9335>
_______________________________________
-------------- next part --------------
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print(chr(255).isalpha())
More information about the Python-bugs-list
mailing list