[issue3080] Full unicode import system
STINNER Victor
report at bugs.python.org
Wed Jan 19 02:22:13 CET 2011
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Here is a work-in-progress patch: issue3080-3.patch. The patch is HUGE and written for Python 3.3.
$ diffstat issue3080-3.patch
Doc/c-api/module.rst | 24
Include/import.h | 73 +
Include/moduleobject.h | 2
Include/pycapsule.h | 4
Modules/zipimport.c | 272 +++---
Objects/moduleobject.c | 52 -
PC/import_nt.c | 84 +-
Python/dynload_aix.c | 2
Python/dynload_dl.c | 2
Python/dynload_hpux.c | 2
Python/dynload_next.c | 4
Python/dynload_os2.c | 2
Python/dynload_shlib.c | 2
Python/dynload_win.c | 2
Python/import.c | 1910 +++++++++++++++++++++++++++----------------------
Python/importdl.c | 79 +-
Python/importdl.h | 2
issue3080.py | 29
18 files changed, 1484 insertions(+), 1063 deletions(-)
As expected, most of the work in done in import.c.
Decode the module name earlier and encode it later. Try to manipulate PyUnicodeObject objects instead of char* buffers (so we have directly the string length).
Split the huge and very complex find_module() function into 3 functions (find_module, find_module_filename and find_module2) and document them. Drop OS/2 support in find_module() (it can be kept, but it was easier for me to drop it and the OS/2 maintainer wrote that Python 3 is far from being compatible with OS/2).
The patch creates some functions: PyModule_GetNameObject(), PyImport_ExecCodeModuleUnicode(), PyImport_AddModuleUnicode(), PyImport_ImportFrozenModuleUnicode(), PyModule_NewUnicode(), ...
Use "U" format to parse a module name, and "%R" to format a module name (to escape surrogates characters and add quotes, instead of "... '%.200s' ...").
PyWin_FindRegisteredModule() is now private. Remove fqname argument from _PyImport_GetDynLoadFunc(), it wasn't used.
Replace open_exclusive() by fopen(name, "wb") on Windows: is it correct?
TODO:
- rename xxxobj => xxx to keep original names and have a short patch (eg. I renamed name to nameobj during the transition to detect bugs)
- catch encoding errors in case_ok()
- don't encode in case_ok() if case_ok() does nothing (eg. on Linux)
- find a better name for find_module2()
The patch contains a tiny script, issue3080.py, to test the patch using an ISO-8859-1 locale.
I will open a thread on the mailing list (python-dev) to decide if this patch is needed or not. If we agree that this issue should be fixed, I will split the patch into smaller parts and start a review process.
----------
keywords: +patch
Added file: http://bugs.python.org/file20448/issue3080-3.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3080>
_______________________________________
More information about the Python-bugs-list
mailing list