[Python-ideas] Support Unicode code point notation

Steven D'Aprano steve at pearwood.info
Sun Jul 28 05:57:11 CEST 2013


On 28/07/13 10:30, Andrew Barnert wrote:

> Unicode could go past 10FFFF without dropping UTF-16, either by adding more surrogate pair ranges, or by adding surrogate triplets. It's really no different from extending UTF-8, which is no problem.
>
> The problem is that we have no way to predict how they will extend UTF-16, UTF-8, or code point notation if that ever happens. Assuming that the max length for a code point is six nibbles does sound like assuming nobody will ever need more than 640k characters.

The Unicode Consortium formally guarantees stability of the character range U+0000 - U+10FFFF.

http://www.unicode.org/faq/utf_bom.html#utf16-6


-- 
Steven


More information about the Python-ideas mailing list