[XML-SIG] DC DOM tests

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 20 Feb 2001 19:42:16 +0100


> I wonder if that's what Martijn means.  I've read that most Java
> implementations have trouble with characters outside the BMP.  I
> wonder if Python handles these properly.

Not sure what "properly" would be:

>>> s=unichr(0xD000)+unichr(0xD800)
>>> s
u'\ud000\ud800'
>>> len(s)
2

Do I even use them in the right order here? It can store them, and
reproduce what was stored. Apart for that, it does not special-case
for surrogates at all.

Regards,
Martin

P.S. I really think Python should have used a 32-bit wide character
representation instead.