Pure python implementation of string-like class

Alan Kennedy alanmk at hotmail.com
Sat Feb 25 15:14:06 EST 2006


[Steve Holden]
>>"Wider than UTF-16" doesn't make sense.

[Ross Ridge]
> It makes perfect sense.

No it doesn't.

UTF-16 is a "Unicode Transcription Format", meaning that it is a
mechanism for representing all unicode code points, even the ones with
ordinals greater than 0xFFFF, using series of 16-bit values.

http://en.wikipedia.org/wiki/UTF-16

"""
UTF-16 represents a character above hexadecimal FFFF as a surrogate
pair of code values from the range D800-DFFF. For example, the
character at code point hexadecimal 10000 becomes the code value
sequence D800 DC00, and the character at hexadecimal 10FFFD, the upper
limit of Unicode, becomes the code value sequence DBFF DFFD. Unicode
and ISO/IEC 10646 do not assign characters to any of the code points in
the D800-DFFF range, so an individual code value from a surrogate pair
does not ever represent a character.
"""

So UTF-16 has no "width" to compare to, no more than utf-8 does.

I wonder what character set the OP is dealing with, if it's not
representable with Unicode. Presumably it's not a modern character set?

--
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan




More information about the Python-list mailing list