[XML-SIG] Re: Issues with Unicode type

Uche Ogbuji uche.ogbuji@fourthought.com
Mon, 23 Sep 2002 14:42:51 -0600


> > On Mon, 2002-09-23 at 21:27, Uche Ogbuji wrote:
> Having said all this, Martin is right about XML and the BMP.  I'd forgotten.

See, I knew I'd make a silly of myself before this thread went very long.

I wasn't even properly reading what I was quoting from the XML spec:

> Character Range
> [2]  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
> [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, 
> FFFE, and FFFF. */
> 
> """
> 
> So 𐠀 is not WF XML.  I'm not sure why JJC uses it.

So I was wrong and 𐠀 is indeed WF, and the problem remains that XML 
processing code will have to augment Python built-ins such as len with 
intelligence about surrogates  :-(


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.
html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/w
ebservices/library/ws-pyth10.html