[issue22555] Tracking issue for adjustments to binary/text boundary handling

Mon Nov 16 13:27:01 EST 2015

Steve Dower added the comment:

Right now all of the tests fail on Windows by default (cp437 for me).

If I change the default IO encoding to utf-8 (hacked into pylifecycle.c, since PYTHONIOENCODING is ignored by subprocesses using -E), the four "Misconfigured" tests crash at the os.fsencode() call (as "mbcs:strict" cannot encode the characters - this may be a real issue, haven't dug into it yet).

Adding more hacks to get past this point brings me back into the ASCII encoding performed by the test, and I'm not sure whether that's just an incorrect assumption for Windows or not.

Separate issue: if I run "chcp 437" before the tests, the output is garbage. If I run "chcp 65001" then it shows the characters in the font correctly. The std streams encoding is taken from this value, but it doesn't map back to UTF-8, which is probably another issue. If I add a separate check in fileutils.c at _Py_device_encoding then I get UTF-8 enabled streams when the console is set for cp65001.

However, there are still a number of places that use GetACP() to determine the locale and encoding to use, which is incorrect for Unicode-aware programs. In particular, this should not happen:

>>> f=open('test.txt', 'w')
>>> f.encoding
'cp1252'

There's no good reason for the default encoding to not be UTF-8 these days, but this is a much bigger change. It's probably worth doing for 3.6, but may need more discussion...

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22555>
_______________________________________