[New-bugs-announce] [issue9819] TESTFN_UNICODE and TESTFN_UNDECODABLE
Hirokazu Yamamoto
report at bugs.python.org
Fri Sep 10 11:39:53 CEST 2010
New submission from Hirokazu Yamamoto <ocean-city at m2.ccsnet.ne.jp>:
Hello. I noticed test suite reports WARNING every time.
///////////////////////////////////////////////////
E:\python-dev>py3k -m test.regrtest test_os
WARNING: The filename '@test_464_tmp-共有される' CAN be encoded by the filesyste
m encoding (mbcs). Unicode filename tests may not be effective
(snip)
///////////////////////////////////////////////////
This happens because TESTFN_UNICODE_UNDECODABLE in Lib/test/support.py
*is* decodable on Japanese environment (cp932).
It is easy to make this really undecodable in Japanese.
Using the characters like "\u2661" or "\u2668" (Former is heart mark,
latter is "Onsen" - Hot spring mark) I could remove the warning by this.
TESTFN_UNENCODABLE = TESTFN + "-\u5171\u6709\u3055\u308c\u308b\u2661\u2668"
///////////////////////////////////////////////////
And another issue. This happens only on test_unicode_file,
///////////////////////////////////////////////////
E:\python-dev>py3k -m test.test_unicode_file
Traceback (most recent call last):
File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in <module>
TESTFN_UNICODE.encode(TESTFN_ENCODING)
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\python-dev\py3k\lib\runpy.py", line 160, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "e:\python-dev\py3k\lib\runpy.py", line 73, in _run_code
exec(code, run_globals)
File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 16, in <module>
raise unittest.SkipTest("No Unicode filesystem semantics on this platform.")
unittest.case.SkipTest: No Unicode filesystem semantics on this platform.
///////////////////////////////////////////////////
This happens because TESTFN_UNICODE cannot be encoded in Japanese.
E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("-\xe0\xf2")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'cp932' codec can't encode character '\xe0' in position 1: i
llegal multibyte sequence
But interesting, this bytes sequence "\xe0\xf2" can be read as
cp932 multibyte characters.
E:\python-dev>python
Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print "\xe0\xf2"
瑣
>>> "\xe0\xf2".decode("cp932")
u'\u7463'
E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u7463')
瑣
I believe this value "\xe0\xf2" came from python2.x, maybe "\u7463"
should be used here? I'm not sure it can be decoded everywhere using
other encodings, though.
----------
components: Tests, Unicode
messages: 115989
nosy: ocean-city
priority: normal
severity: normal
status: open
title: TESTFN_UNICODE and TESTFN_UNDECODABLE
versions: Python 3.1, Python 3.2
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9819>
_______________________________________
More information about the New-bugs-announce
mailing list