[issue26226] Various test suite failures on Windows

Wed Jun 29 00:00:10 EDT 2016

Eryk Sun added the comment:

time.strftime calls the CRT's strftime function, which the Windows universal CRT implements by calling wcsftime and encoding the result. The timezone name is actually stored as a char string (tzname), so wcsftime has to decode it via mbstowcs. 

The problem is that in the C locale tzname is an ANSI (1252) string while mbstowcs simply casts to wchar_t, which is the same as decoding as Latin-1. This works fine for "é" (U+00E9). But the right single quote character (U+2019) is "\x92" in 1252, and a simple cast maps it to the non-character U+0092. 

When the CRT's strftime encodes this back as an ANSI string, it maps U+0092 to the replacement character for 1252, a question mark. Similarly, time.tzname decodes the tzname ANSI strings using mbstowcs, with the same mismatch between LC_CTYPE and LC_TIME, resulting in the string "Est (heure d\x92été)"

In summary, the problem is that LC_TIME uses ANSI in the C locale, while LC_CTYPE uses Latin-1. A workaround (in most cases) is to delay importing the time module until after setting LC_CTYPE (also setting LC_TIME should cover all cases). For example:

    >>> import sys, locale
    >>> 'time' in sys.modules
    False
    >>> locale.setlocale(locale.LC_CTYPE, '')
    'French_France.1252'
    >>> import time
    >>> time.tzname
    ('Est', 'Est (heure d’été)')
    >>> time.strftime('%Z')
    'Est (heure d’été)'

Note that Unix Python 3 sets LC_CTYPE at startup, so doing the same on Windows would actually improve cross-platform consistency.

----------
nosy: +eryksun

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26226>
_______________________________________