os.system and unicode arguments fail on Win32

Martin v. Löwis martin at v.loewis.de
Tue Jan 21 20:25:08 EST 2003


Tim Daneliuk <tundra at tundraware.com> writes:

> > This will change in Python 2.3. In many cases, Python 2.2 will also
> > accept Unicode strings in file system API on Windows. For Python 2.3,
> > and NT+, all Unicode strings are usable as file names.
> > This still does not include os.system, or environment variables.
> 
> What is the restiction here that prevents 2.3 from doing things
> the same way with these portions of the OS.

I have problems parsing this sentence. Is this a question?

> Incidentally, it seem strange to me that Win32 is inherently
> a unicode environment but os.system (which presumably mapps
> to some Win32 API) has trouble with unicode strings...-

If you are asking why Python 2.3 won't support Unicode strings to
os.system? Primarily, because nobody has contributed code to do so. Do
you volunteer?

Looking more closely, you will find that os.system is a wrapper around
the C library function system(), which does not support
Unicode¹. Internally (i.e. in the Microsoft C library), system() calls
the ANSI variants of the Win32 API, which then internally (i.e. in the
operating system code) call the Unicode variants on NT+.

Getting rid of these layers of indirection might account to a complete
reimplementation of the C library part that does os.system. Add to
that the difficulties of using the Unicode Win32 API on W9x.

> Do you happen to have a URL (or better still, a programmatic method)
> whereby I might determine the native encodings for various systems?

In Python 2.3, locale.getpreferredencoding() should always return the
encoding that users are likely to use for text data. It uses:
- getdefaultlocale()[1] on Windows (and also on the Mac, although
  that isn't implemented for OS X in Python 2.2),
- locale.nl_langinfo(locale.CODESET) for POSIX systems, provided
  they have both nl_langinfo and CODESET,
- getdefaultlocale()[1] on POSIX systems if nl_langinfo doesn't work.

getdefaultlocale, in turn, uses:
- GetACP() on Windows (printing it as cp%d),
- GetScriptVariable(script, smScriptLang) on Mac OS 9,
- CFStringGetSystemEncoding() (potentially followed
  by CFStringConvertEncodingToIANACharSetName()) on Mac OS X²
- environment variables on all other systems.

Notice that, on Windows, there are *two* native encodings: the ANSI
code page (what the ANSI Win32 API expects, and which is used in the
windowing system), and the OEM code page (which the FAT file system
uses on disk, and the command.com/cmd.exe terminal windows, unless
setcp.exe is invoked)

Also notice that, on OS X, the encoding used on for file names is
always UTF-8, regardless of what CFStringGetSystemEncoding() returns.
This is true atleast for the BSD POSIX layer of API calls; higher
layer API calls may use different encodings.

Regards,
Martin

¹ There might be a Microsoft extension _wsystem; I haven't checked.
² for Python 2.3 only




More information about the Python-list mailing list