os.system and unicode arguments fail on Win32
Martin v. Löwis
martin at v.loewis.de
Wed Jan 22 03:15:31 EST 2003
Tim Daneliuk <tundra at tundraware.com> writes:
> (Clearly) I am not too familiar with this, so I ran the commands
> as you suggest and got ('en_US', 'cp1252') just as you've explained.
> So... where does 'mcbs' come from? That is, why is the translation
> from unicode to bytestring not:
>
> y = encode(unicode-var, "cp1252")
>
> or conversely
>
> u = unicode(byte-var, "cp1252")
"mbcs" is a codec which internally does MultiByteToWideChar with
CP_ACP. I.e. it converts to the "ANSI code page", which is a code page
alias that depends on the Windows installation. It defaults to some
Microsoft provided factory default, which is depending on the language:
code page region
1250 Central Europe
1251 Cyrillic
1252 Western Europe
1253 Greek
1254 Turkish
1255 Hebrew
1256 Arabic
1257 Baltic
1258 Vietnamese
874 Thai
932 Japan
936 Simplified Chinese
949 Korea
950 Traditional Chinese
On WXP, this default can be changed by the Administrator. So encoding
Unicode strings with cp1252 is only correct on a Western-Europe
installation of Windows, elsewhere it would give confusing results.
In addition, the "cp1252" codec is a Python-provided one. Python
currently does not provide CJK codecs, so you can't use, say "cp932" as
an encoding name on Windows. However, if cp932 happens to be the ANSI
code page, then you can use "mbcs" to access that encoding.
Regards,
Martin
More information about the Python-list
mailing list