Python's handling of unicode surrogates

"Martin v. Löwis" martin at v.loewis.de
Fri Apr 20 01:44:11 EDT 2007


> Thoughts, from all you readers out there?  For/against?  

See PEP 261. This things have all been discussed at that time,
and an explicit decision against what I think (*) your proposal is
was taken. If you want to, you can try to revert that
decision, but you would need to write a PEP.

Regards,
Martin

(*) I don't fully understand your proposal. You say that you
want "gaps in [the string's] index", but I'm not sure what
that means. If you have a surrogate pair on index 4, would
it mean that s[5] does not exist, or would it mean that
s[5] is the character following the surrogate pair? Is
there any impact on the length of the string? Could it be
that len(s[k]) is 2 for some values of s and k?



More information about the Python-list mailing list