[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Mon, 25 Jun 2001 21:41:51 -0400


Martin v. Loewis writes:
> So nothing will happen until enough Chinese users complain. I don't
> know whether you count as Chinese for these purposes :-)

Perhaps not. :-) But the Chinese aren't the only ones to worry
about. The Japanese also have characters being added outside the BMP,
and Ruby holds sway in Japan...

> P.S. The real issue IMO is display: If there are fonts supporting
> these characters, people will want to write programs that make use of
> the fonts. Until nobody can actually display such text, nobody will
> request that indexing works reasonable.

True to a point. Fonts do exist for these characters. And I end up
referencing them even when I don't have fonts. Many Chinese
organizations are worried more about making sure all their characters
are encoded, and less on being able to display them
adequately. Indeed, the HKSAR and CUHK are working on a project
whereby rare characters are also encoded using the ideographic
description characters.

> P.P.S. Of course, if we wait until users actually use surrogates, it
> is too late to change the indexing - that would likely break people's
> code.

All too true.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"