[I18n-sig] How does Python Unicode treat surrogates?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 01:58:17 +0200


> But unless I misunderstand what it *is* that you are suggesting, the
> O(1) indexing property can't be retained with your suggestion, and
> that's out of the question.

The O(1) indexing property can be retained for strings not containing
surrogates, while still counting surrogate pairs as one character.
Unfortunately, this will require an additional word per unicode
object, unless I'm allowed to use a byte past the terminating zero
(which will only slightly reduce the memory overhead).

If somebody can find a spare bit :-)

Regards,
Martin