[issue22324] Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode()

STINNER Victor report at bugs.python.org
Wed Sep 3 09:10:36 CEST 2014


STINNER Victor added the comment:

> Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.

Yes, it will be slower. But I prefer slower code with a lower memory footprint. On UNIX, I don't think that anyone will notice the difference.

My concern is that the cache is never released. If the conversion is only needed once at startup, the memory will stay until Python exits. It's not really efficient.

On Windows, conversion to wchar_t* is common because Python uses the Windows wide character API ("W" API vs "A" ANSI code page API). For example, most access to the filesystem use wchar_t* type.

On Python < 3.3, Python was compiled in narrow mode and so Unicode was already using wchar_t* internally to store characters. Since Python 3.3, Python uses a more compact representation. wchar_t* shares Unicode data only if sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples: "\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits wchar_t) .

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22324>
_______________________________________


More information about the Python-bugs-list mailing list