[XML-SIG] Issues with Unicode (wrap-up and moving along)

Eric van der Vlist vdv@dyomedea.com
24 Sep 2002 10:30:48 +0200


First, thanks for the very helpfull answers!

As a wrap-up, I think that we can say that:

1) Unicode is supported as code units rather than code points in Python.
2) This is visible on unicode.len() but also in other modules such as
re.
3) Even though the impact seems more theoratical than real world, this
makes it difficult to be compliant with XML 1.0 in the support of
associated specifications (W3C XML Schema datatypes is an example but
XPath is probably also impacted).
4) The solution which is most conform with the decisions taken by Python
is to give the choice to users between using an interpreter compiled
with unicode 16 or 32 bits. In the first case (which is the default) the
result will not be totally compliant and will not pass all the test
suites. In the second one, the result will eventually be totally
compliant.

Note that as this is quite easy to detect, implementations could
eventually raise exceptions when unaccurate results might happen.

To facilitate using 32 bits unicode Python binaries, we could also
suggest to major distributions to provide alternative packages compiled
with this option.

Now, I have also tried to use such an interpreter. The good news is that
the unicode class works as expected:

vdv@ibook:~$ python
Python 2.2.1 (#1, Sep 24 2002, 09:37:13)=20
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.maxunicode
1114111
>>> u =3D u'\U00010800'
>>> print len(u)
1

The bad news is that the migration doesn't seem to be so easy, at least
for 4Suite and it blows up when I try to run my test suite:

  File "/usr/lib/python2.2/site-packages/Ft/Xml/cDomlette.py", line 14,
in ?
    import cDomlettec
ImportError: /usr/lib/python2.2/site-packages/Ft/Xml/cDomlettec.so:
undefined symbol: PyUnicodeUCS2_AsEncodedString

Should I fill a bug :-) ?

Thanks

Eric
--=20
Rendez-vous =E0 Paris.
                          http://www.technoforum.fr/integ2002/index.html
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------