[New-bugs-announce] [issue9819] TESTFN_UNICODE and TESTFN_UNDECODABLE

Fri Sep 10 11:39:53 CEST 2010

New submission from Hirokazu Yamamoto <ocean-city at m2.ccsnet.ne.jp>:

Hello. I noticed test suite reports WARNING every time.

///////////////////////////////////////////////////

E:\python-dev>py3k -m test.regrtest test_os
WARNING: The filename '@test_464_tmp-共有される' CAN be encoded by the filesyste
m encoding (mbcs). Unicode filename tests may not be effective
(snip)

///////////////////////////////////////////////////

This happens because TESTFN_UNICODE_UNDECODABLE in Lib/test/support.py
*is* decodable on Japanese environment (cp932).

It is easy to make this really undecodable in Japanese.
Using the characters like "\u2661" or "\u2668" (Former is heart mark,
latter is "Onsen" - Hot spring mark) I could remove the warning by this.
    TESTFN_UNENCODABLE = TESTFN + "-\u5171\u6709\u3055\u308c\u308b\u2661\u2668"

///////////////////////////////////////////////////

And another issue. This happens only on test_unicode_file,

///////////////////////////////////////////////////

E:\python-dev>py3k -m test.test_unicode_file
Traceback (most recent call last):
  File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 12, in <module>
    TESTFN_UNICODE.encode(TESTFN_ENCODING)
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: inval
id character

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "e:\python-dev\py3k\lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "e:\python-dev\py3k\lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "e:\python-dev\py3k\lib\test\test_unicode_file.py", line 16, in <module>
    raise unittest.SkipTest("No Unicode filesystem semantics on this platform.")

unittest.case.SkipTest: No Unicode filesystem semantics on this platform.

///////////////////////////////////////////////////

This happens because TESTFN_UNICODE cannot be encoded in Japanese.

E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("-\xe0\xf2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'cp932' codec can't encode character '\xe0' in position 1: i
llegal multibyte sequence

But interesting, this bytes sequence "\xe0\xf2" can be read as
cp932 multibyte characters.

E:\python-dev>python
Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print "\xe0\xf2"
瑣
>>> "\xe0\xf2".decode("cp932")
u'\u7463'

E:\python-dev>py3k
Python 3.2a2+ (py3k:84663M, Sep 10 2010, 13:24:41) [MSC v.1400 32 bit (Intel)] o
n win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u7463')
瑣

I believe this value "\xe0\xf2" came from python2.x, maybe "\u7463"
should be used here? I'm not sure it can be decoded everywhere using
other encodings, though.

----------
components: Tests, Unicode
messages: 115989
nosy: ocean-city
priority: normal
severity: normal
status: open
title: TESTFN_UNICODE and TESTFN_UNDECODABLE
versions: Python 3.1, Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9819>
_______________________________________