[Tutor] myown.getfilesystemencoding()

eryksun eryksun at gmail.com
Fri Aug 30 18:39:25 CEST 2013


On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code
> system), which doesn't say very much imho.

Why aren't you using Unicode for the filename? The native encoding for
NTFS is UTF-16, and CPython 2.x uses _wfopen() if you pass it a
Unicode filename:

http://hg.python.org/cpython/file/70274d53c1dd/Objects/fileobject.c#l357
http://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.90)

Anyway, the "mbcs" codec uses mbcs_encode() and mbcs_decode() from the
codecs module. In CPython 2.x, these call PyUnicode_EncodeMBCS() and
PyUnicode_DecodeMBCS(), which in turn call the Windows API functions
WideCharToMultiByte() and MultiByteToWideChar() for the CP_ACP (ANSI)
codepage. This is a system defined encoding, such as Windows 1252.

> So I wrote the function below, which returns the codepage as reported by
> the windows chcp command.

chcp.com is a console application. It's calling GetConsoleCP(), which
simply returns the current code page of the attached console (running
the command creates a new console if there isn't one to inherit from
the parent). This isn't the function you want. There's already a
Python function that returns the default ANSI codepage:

    >>> import locale
    >>> locale.getpreferredencoding()
    'cp1252'

You can also use ctypes to call the Windows API directly, and then
convert the integer to a string:

    >>> from ctypes import windll
    >>> str(windll.kernel32.GetACP())
    '1252'

> the function returns 850 (codepage 850) when I run it via the command prompt,
> but 1252 (cp1252) when I run it in my IDE (Spyder).

Maybe Spyder communicates with python.exe as a subprocess in a hidden
console, with the console's codepage set to 1252. You can use ctypes
to check windll.kernel32.GetConsoleCP(). If a console is attached,
this will return a nonzero value.


More information about the Tutor mailing list