[Tutor] myown.getfilesystemencoding()
eryksun
eryksun at gmail.com
Fri Aug 30 18:39:25 CEST 2013
On Fri, Aug 30, 2013 at 11:04 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> In Windows, sys.getfilesystemencoding() returns 'mbcs' (multibyte code
> system), which doesn't say very much imho.
Why aren't you using Unicode for the filename? The native encoding for
NTFS is UTF-16, and CPython 2.x uses _wfopen() if you pass it a
Unicode filename:
http://hg.python.org/cpython/file/70274d53c1dd/Objects/fileobject.c#l357
http://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.90)
Anyway, the "mbcs" codec uses mbcs_encode() and mbcs_decode() from the
codecs module. In CPython 2.x, these call PyUnicode_EncodeMBCS() and
PyUnicode_DecodeMBCS(), which in turn call the Windows API functions
WideCharToMultiByte() and MultiByteToWideChar() for the CP_ACP (ANSI)
codepage. This is a system defined encoding, such as Windows 1252.
> So I wrote the function below, which returns the codepage as reported by
> the windows chcp command.
chcp.com is a console application. It's calling GetConsoleCP(), which
simply returns the current code page of the attached console (running
the command creates a new console if there isn't one to inherit from
the parent). This isn't the function you want. There's already a
Python function that returns the default ANSI codepage:
>>> import locale
>>> locale.getpreferredencoding()
'cp1252'
You can also use ctypes to call the Windows API directly, and then
convert the integer to a string:
>>> from ctypes import windll
>>> str(windll.kernel32.GetACP())
'1252'
> the function returns 850 (codepage 850) when I run it via the command prompt,
> but 1252 (cp1252) when I run it in my IDE (Spyder).
Maybe Spyder communicates with python.exe as a subprocess in a hidden
console, with the console's codepage set to 1252. You can use ctypes
to check windll.kernel32.GetConsoleCP(). If a console is attached,
this will return a nonzero value.
More information about the Tutor
mailing list