[I18n-sig] Re: [Python-Dev] Unicode debate

Just van Rossum just@letterror.com
Tue, 2 May 2000 14:44:30 +0100


At 8:30 AM -0400 02-05-2000, Guido van Rossum wrote:
>I think /F's point was that the Unicode standard prescribes different
>behavior here: for UTF-8, a missing or lone continuation byte is an
>error; for Unicode, accents are separate characters that may be
>inserted and deleted in a string but whose display is undefined under
>certain conditions.
>
>(I just noticed that this doesn't work in Tkinter but it does work in
>wish.  Strange.)
>
>> FYI: Normalization is needed to make comparing Unicode
>> strings robust, e.g. u"=C8" should compare equal to u"e\u0301".
>
>Aha, then we'll see u =3D=3D v even though type(u) is type(v) and len(u)
>!=3D len(v).  /F's world will collapse. :-)

Does the Unicode spec *really* specifies u should compare equal to v? This
behavior would be the responsibility of a layout engine, a role which is
way beyond the scope of Unicode support in Python, as it is language- and
script-dependent.

Just