Is there really a default source encoding?

"Martin v. Löwis" martin at v.loewis.de
Fri Jan 24 21:09:03 EST 2003


Brian Quinlan wrote:
> What if, in the future, there are close to 2^32 Unicode characters.
> UTF-32 might require only 4 bytes to store a character while UTF-16
> would require 6. Or is that impossible?

That's impossible. ISO and the Unicode consortium have restricted 
Unicode to 17 planes (roughly 2^21 characters) (formally, all the other 
UCS-4 code points are reserved, and ISO has unassigned the 
previously-assigned private-use group).

Even if those reserved characters would ever be assigned, UTF-16 could 
not encode them. The way surrogate pairs work, there is just no 
representation for characters in plane 18 and beyond.

Regards,
Martin





More information about the Python-list mailing list