[New-bugs-announce] [issue13054] sys.maxunicode value after PEP-393

Wed Sep 28 17:47:35 CEST 2011

New submission from Ezio Melotti <ezio.melotti at gmail.com>:

Now that PEP 393 is in and the distinction between narrow and wide doesn't exist anymore, the value of sys.maxunicode should always be 0x10FFFF.

sys.maxunicode currently uses PyUnicode_GetMax (Objects/unicodeobject.c:196) and still returns either 0x10FFFF if  Py_UNICODE_WIDE is defined or 0xFFFF if it's not (and that should now mean that it's defined on Linux where wchar_t is 4 bytes, but not on Windows where it's 2 bytes (isn't this backward incompatible? if so it probably deserves another issue)).

IIUC the difference between narrow and wide is gone for Python users, but it's still there for C users that use the old API, so changing PyUnicode_GetMax will most likely break their code.

I therefore suggest to set sys.maxunicode to 0x10FFFF and to leave PyUnicode_GetMax as is.

C users that switch to the new API should stop using PyUnicode_GetMax and it should be added along with the other deprecated functions in PEP 393.
If sys.maxunicode becomes a constant, it won't be useful to determine if the build is narrow or wide anymore (that won't actually matter anymore, but this was the main use of sys.maxunicode), but it might still be useful to know the value of the highest codepoint.  Therefore I think that sys.maxunicode can still stay around without being deprecated (its documentation should be fixed though).

----------
assignee: ezio.melotti
components: Interpreter Core, Unicode
messages: 144568
nosy: ezio.melotti, haypo, lemburg, loewis
priority: high
severity: normal
stage: test needed
status: open
title: sys.maxunicode value after PEP-393
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13054>
_______________________________________