[Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII

Tue May 10 02:57:14 CEST 2011

Le mardi 10 mai 2011 à 09:52 +1000, Neil Hodgson a écrit :
>    Some C and C++ implementations currently allow non-ASCII
> identifiers and the forthcoming C1X and C++0x language standards
> include non-ASCII identifiers. The allowed characters are specified in
> Annexes of the respective standards.
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf - Annex D
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf - Annex E

I read these documents but they don't explain which encoding is used in
libraries and programs. Does it mean that Windows and Linux may use
different encodings? At least, the surrogate range (U+DC00-U+DFFF) is
excluded, which is a good news (UTF-8 decoder of Python 3 rejects
surrogate characters).

I discovered -fextended-identifiers option of gcc: using this option,
you can use \uHHHH and \UHHHHHHHH in identifiers, but not \xHH. On
Linux, identifiers are encoded to UTF-8.

Example:
--------------
#define _ISOC99_SOURCE
#include <stdio.h>

int f\u00E9() { wprintf(L"U+00E9 = \xE9\n"); }

int g\U000000E8() { wprintf(L"U+00E8 = \xE8\n"); }

int main() { f\u00E9(); g\U000000E8(); return 0; }
--------------

It's not very practical, I would prefer to write directly Unicode
characters (as I can do in Python 3!). I'm not sure that chineses will
prefer to call \u4f60\u597d() instead of hello().

Ok, I now agree, it is possible to use non-ASCII characters in C. But
what about the encoding of symbols in a dynamic library: is it always
UTF-8?

Victor