os.system and unicode arguments fail on Win32

Martin v. Löwis martin at v.loewis.de
Wed Jan 22 03:15:31 EST 2003


Tim Daneliuk <tundra at tundraware.com> writes:

> (Clearly) I am not too familiar with this, so I ran the commands
> as you suggest and got ('en_US', 'cp1252') just as you've explained.
> So... where does 'mcbs' come from?  That is, why is the translation
> from unicode to bytestring not:
> 
>                 y = encode(unicode-var, "cp1252")
> 
> or conversely
> 
>                 u = unicode(byte-var, "cp1252")

"mbcs" is a codec which internally does MultiByteToWideChar with
CP_ACP. I.e. it converts to the "ANSI code page", which is a code page
alias that depends on the Windows installation. It defaults to some
Microsoft provided factory default, which is depending on the language:

code page region                 
1250      Central Europe         
1251      Cyrillic               
1252      Western Europe         
1253      Greek                  
1254      Turkish                
1255      Hebrew                 
1256      Arabic                 
1257      Baltic                 
1258      Vietnamese             

874       Thai                   
932       Japan                  
936       Simplified Chinese     
949       Korea                  
950       Traditional Chinese    

On WXP, this default can be changed by the Administrator. So encoding
Unicode strings with cp1252 is only correct on a Western-Europe
installation of Windows, elsewhere it would give confusing results.

In addition, the "cp1252" codec is a Python-provided one. Python
currently does not provide CJK codecs, so you can't use, say "cp932" as
an encoding name on Windows. However, if cp932 happens to be the ANSI
code page, then you can use "mbcs" to access that encoding.

Regards,
Martin




More information about the Python-list mailing list