[Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)

Neil Hodgson neilh at scintilla.org
Mon May 1 06:42:26 EDT 2000


> >   Well, depends on how far you stretch 'more or less'. UTF-16 has room
to
> >encode about 900,000 characters by using two 16 bit elements.
>
> Is this the "surrogate" thingies?

   Yes. The surrogates use a pair of 16 bit numbers to represent a
character. The first number is in the range 0xD800 to 0xDBFF and the second
in the range 0xDC00 to 0xDFFF so you get 0x400 * 0x400 = 0x100000 = 1048576
characters although some must not be allowed as one reference says only
917,504 are possible.

http://www.unicode.org/unicode/standard/principles.html
http://www.unicode.org/unicode/faq/

   Neil






More information about the Python-list mailing list