Is there really a default source encoding?
"Martin v. Löwis"
martin at v.loewis.de
Fri Jan 24 21:09:03 EST 2003
Brian Quinlan wrote:
> What if, in the future, there are close to 2^32 Unicode characters.
> UTF-32 might require only 4 bytes to store a character while UTF-16
> would require 6. Or is that impossible?
That's impossible. ISO and the Unicode consortium have restricted
Unicode to 17 planes (roughly 2^21 characters) (formally, all the other
UCS-4 code points are reserved, and ISO has unassigned the
previously-assigned private-use group).
Even if those reserved characters would ever be assigned, UTF-16 could
not encode them. The way surrogate pairs work, there is just no
representation for characters in plane 18 and beyond.
Regards,
Martin
More information about the Python-list
mailing list