[XML-SIG] Re: Issues with Unicode type

Martin v. Loewis martin@v.loewis.de
26 Sep 2002 19:45:48 +0200


Eric van der Vlist <vdv@dyomedea.com> writes:

> Yes, but when it comes to implement the W3C XML Schema "pattern" facet
> which is basically regular expressions embedded in schemas, this seems
> to require rewriting a full regular expressions engine. What I meant by
> "not natively conform" is that it *seems* not feasable with the builtin
> re module in its current state.

Then you may consider the following patch, which I just checked into
Python 2.2.2 and 2.3. It should fix your test case.

Regards,
Martin

Index: sre_compile.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/sre_compile.py,v
retrieving revision 1.43
retrieving revision 1.44
diff -u -r1.43 -r1.44
--- sre_compile.py	27 Jun 2002 20:08:25 -0000	1.43
+++ sre_compile.py	26 Sep 2002 16:39:20 -0000	1.44
@@ -188,6 +188,9 @@
                 # XXX: could append to charmap tail
                 return charset # cannot compress
     except IndexError:
+        if sys.maxunicode != 65535:
+            # XXX: big charsets don't work in UCS-4 builds
+            return charset
         # character set contains unicode characters
         return _optimize_unicode(charset, fixup)
     # compress character map