[XML-SIG] Issues with unicode type

Eric van der Vlist vdv@dyomedea.com
23 Sep 2002 17:42:42 +0200


Hi,

I have started to work on an implementation of a W3C XML Schema type
library for Relax NG and I am hiting my first problems with unicode.

One of the test case from the test suite provided by James Clark is:

<?xml version=3D"1.0" encoding=3D"utf-8"?>
      <doc>&#67584;</doc>

and the length of the text node of the doc element is supposed to be 1
instead of 2 as expected by my (naive) implementation of the length
facet.

What makes me think that it could be a generic issue with python is the
following (kindly contributed by Uche):

<uche> >>> hex(67584)
<uche> '0x10800'
<uche> >>> c =3D u"\u10800"
<uche> >>> c
<uche> u'\u10800'
<uche> >>> len(c)
<uche> 2

I am not a Unicode expert (in fact I'd rather say I am a Unicode
newbie), but shouldn't len(c) return 1?=20

Thanks

Eric

--=20
Rendez-vous =E0 Paris.
                          http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------