[I18n-sig] How does Python Unicode treat surrogates?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 01:32:25 +0200


> Does that make sense?
> 
> I know I am hindered by a lack of understanding of Unicode
> hairsplitting, angels-on-a-pin-dancing details; if I'm missing
> something, it's likely that many other people don't know the details
> either, so an explanation would be much appreciated!

I don't think you are missing any detail; I guess you are fully aware
that you are throwing one of Unicode's biggest strengths out of the
window :-) namely the possibility to index index characters, not the
internal representation.

As for Unicode hairsplitting: I think combining characters *are*
different in that respect; they are code points on their own, even
though they might have a zero-width representation. Also,
normalization forms can help with combining characters; they don't
help with surrogates.

Regards,
Martin