Unicode 7

Chris Angelico rosuav at gmail.com
Fri May 2 00:54:19 EDT 2014


On Fri, May 2, 2014 at 2:42 PM, Rustom Mody <rustompmody at gmail.com> wrote:
> Unicode consortium's going from old BMP to current (6.0) SMPs to who-knows-what
> in the future is similar.

Unicode 1.0: "Let's make a single universal character set that can
represent all the world's scripts. We'll define 65536 codepoints to do
that with."

Unicode 2.0: "Oh. That's not enough. Okay, let's define some more."

It's not a fundamental change, nor is it unhelpful to Unicode's cause.
It's simply an acknowledgement that 64K codepoints aren't enough. Yes,
that gave us the mess of UTF-16 being called "Unicode" (if it hadn't
been for Unicode 1.0, I doubt we'd now have so many languages using
and exposing UTF-16 - it'd be a simple judgment call, pick
UTF-8/UTF-16/UTF-32 based on what you expect your users to want to
use), but it doesn't change Unicode's goal, and it also doesn't
indicate that there's likely to be any more such changes in the
future. (Just look at how little of the Unicode space is allocated so
far.)

ChrisA



More information about the Python-list mailing list