Micro Python -- a lean and efficient implementation of Python 3

Ian Kelly ian.g.kelly at gmail.com
Wed Jun 4 01:55:08 EDT 2014


On Jun 3, 2014 11:27 PM, "Steven D'Aprano" <steve at pearwood.info> wrote:
> For technical reasons which I don't fully understand, Unicode only
> uses 21 of those 32 bits, giving a total of 1114112 available code
> points.

I think mainly it's to accommodate UTF-16. The surrogate pair scheme is
sufficient to encode up to 16 supplementary planes, so if Unicode were
allowed to grow any larger than that, UTF-16 would no longer be able to
encode all codepoints.

Another benefit of fixing the size is that it frees the other 11 bits per
character of UTF-32 for packing in ancillary data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140603/99ada078/attachment.html>


More information about the Python-list mailing list