[issue10542] Py_UNICODE_NEXT and other macros for surrogates

Tue Aug 16 14:11:22 CEST 2011

Marc-Andre Lemburg <mal at egenix.com> added the comment:

Tom Christiansen wrote:
> So keeping your preamble bits, I might have considered doing it
> this way if it were me doing it:
> 
>     #define _Py_UNICODE_IS_SURROGATE
>     #define _Py_UNICODE_IS_LEAD_SURROGATE
>     #define _Py_UNICODE_IS_TRAIL_SURROGATE
>     #define _Py_UNICODE_JOIN_SURROGATES
> 
> But I also come from a culture that uses more underscores than you guys tend 
> to, as shown in some of the macro names shown below from utf8.h file.  I find
> that most projects use more underscores in uppercase names than Python does. :)

The reasoning behind e.g. "ISSURROGATE" is that those names originate
from and are consistent with the already existing ISLOWER/ISUPPER/ISTITLE
macros which in return stem from the C APIs of the same names
(see unicodeobject.h for reference).

Regarding low/high vs. lead/trail: The Unicode database uses the
terms low/high and we do in Python as well, so let's stick with
those.

What I don't understand is why those macros should be declared
private to Python (with the leading underscore). They are quite
useful for extensions implementing codecs or other transformations
as well.

BTW: I think the other issues mentioned in the discussion are more
important to get right, than the names of those macros.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________