[ python-Bugs-836035 ] strftime month name is encoded somehow

Thu Sep 22 10:39:38 CEST 2005

Bugs item #836035, was opened at 2003-11-04 21:49
Message generated for change (Settings changed) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=836035&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
>Group: Python 2.4
Status: Open
>Resolution: None
Priority: 5
Submitted By: Tim Evans (tim_evans)
Assigned to: Nobody/Anonymous (nobody)
Summary: strftime month name is encoded somehow

Initial Comment:
On Windows XP, with some locales the month name
returned by time.strftime('%B') is encoded somehow. 
For example:

&gt;&gt;&gt; import time, locale
&gt;&gt;&gt; locale.setlocale(locale.LC_ALL, '')
&quot;Chinese_People's Republic of China.936&quot;
&gt;&gt;&gt; time.strftime('%B')
'\xca\xae\xd2\xbb\xd4\xc2'
&gt;&gt;&gt; time.strftime('%d %B %Y')
'05 \xca\xae\xd2\xbb\xd4\xc2 2003'

&gt;&gt;&gt; locale.setlocale(locale.LC_ALL, '')
'French_France.1252'
&gt;&gt;&gt; time.strftime('%B', (2003,12,1,0,0,0,0,0,0))
'd\xe9cembre'

I'm not sure what encoding the Chinese version is
using, but the French is compatible with latin-1.  It
would appear that the encoding used is locale-dependent.

Ideally, the win32 version of time.strftime would call
the wide-character version of strftime (called
wcsftime) and return a unicode object.

I haven't looked at what this does under any other
operating system.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-11-07 19:56

Message:
Logged In: YES 
user_id=21627

This tells me that we need a function to return the current
locale's code page; this should return &quot;cp936&quot; in your case.
The fact that Python does not have a codec for cp936 is an
independent issue.

----------------------------------------------------------------------

Comment By: Tim Evans (tim_evans)
Date: 2003-11-06 23:21

Message:
Logged In: YES 
user_id=561705

I have looked at the source code for the MS C library (it
comes with VC++6) and I believe that that something
equivalent to the following is used:

char codepage[16];
GetLocaleInfo(
    GetThreadLocale(),
    LOCALE_IDEFAULTANSICODEPAGE,
    codepage, 16);

This returns &quot;1252&quot; for &quot;C&quot; locale, and for the chinese
locale that I was expirmenting with it returns &quot;936&quot;. 
Python does not have an encoding &quot;cp936&quot;, but from C the
conversion with an explicit codepage produces the same
results as mbstwcs.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-11-06 22:33

Message:
Logged In: YES 
user_id=21627

Is there any way to find out the encoding that mbstowcs uses?

----------------------------------------------------------------------

Comment By: Tim Evans (tim_evans)
Date: 2003-11-06 22:00

Message:
Logged In: YES 
user_id=561705

The windows C lib docs say that calling mbstowcs on the
output of strftime (or calling wcsftime instead of strftime)
will return the correct wide-character (utf-16?) string. 
This produces something that looks like it could be correct.
 Decoding with the 'mbcs' encoding in Python is not
equivalent to calling mbstowcs because mbstowcs is
locale-dependent.

Perhaps it would be a good idea to have time.strftime return
a unicode string.  As this wouldn't be backward compatible,
it could be done via a new function time.ustrftime, or via
an optional unicode=True argument to the existing function.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-11-06 09:53

Message:
Logged In: YES 
user_id=38388

Tim, there's nothing much we can do about this since the
strftime()
API is a direct interface to the underlying C lib API. Python
simply passes through the arguments to this function and
returns whatever teh C lib has to offer.

Please refer to the C lib documentation for your platform
for details about the encoding being used for the strings.

BTW, a simpe table with the month names in your application
should nicely solve your problem; addtitionally it gives you
full control ove the encoding and wording being used.

----------------------------------------------------------------------

Comment By: Tim Evans (tim_evans)
Date: 2003-11-05 23:45

Message:
Logged In: YES 
user_id=561705

I'm reopening the bug, because that doesn't seem to work:

&gt;&gt;&gt; import time, locale
&gt;&gt;&gt; locale.setlocale(locale.LC_ALL, '')
&quot;Chinese_People's Republic of China.936&quot;
&gt;&gt;&gt; x = time.strftime('%B')
&gt;&gt;&gt; x
'\xca\xae\xd2\xbb\xd4\xc2'
&gt;&gt;&gt; x.decode('mbcs')
'\xca\xae\xd2\xbb\xd4\xc2'
&gt;&gt;&gt; locale.getpreferredencoding()
'cp1252'
&gt;&gt;&gt; x.decode('cp1252')
'\xca\xae\xd2\xbb\xd4\xc2'

The preferred encoding is returned as cp1252, which can't be
correct.  And niether cp1252 nor mbcs appear to decode the
string into anything containing the high-numbered characters
I would expect for chinese (neither of them changes the
string at all).

The following problems (may) exist:
1.  locale.getpreferredencoding() doesn't work.
2.  The string return by time.strftime() is not mbcs encoded.
3.  The documentation for time.strftime() doesn't say how
the string is encoded.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-11-05 21:28

Message:
Logged In: YES 
user_id=21627

It always contains a byte string in the locale's encoding;
for compatibility, this cannot be changed.

On Windows, you can access the encoding as &quot;mbcs&quot;. In
general, you need to use locale.getpreferredencoding() to
find out what encoding this string is in.

Closing as not-a-bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=836035&group_id=5470