[XML-SIG] 4XPath: parsing Unicode string

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Sun, 26 Nov 2000 06:05:21 +0900


Hi,

I've used 4Suite 0.9.2 together with Python 2.0 and PyXML 0.6.2.

I have a problem that I cannot pass a Unicode string containing
Japanese characters to the 4XPath parser.  Following reproduces
the problem:

>>> from xml.xpath import XPathParser
>>> p = XPathParser.XPathParser()
>>> path = p.parseExpression(u'substring-after("2000/10/30", "/")')

The expression above does not have any problem, but the next,
very similar one does:

>>> path = p.parseExpression(u'substring-after("2000\u5E7410\u670830\u65E5", "\u6708")')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/opt/lib/python2.0/site-packages/_xmlplus/xpath/XPathParser.py", line 36, in parseExpression
    XPathParserBase.XPathParserBase.parse(self, st)
  File "/opt/lib/python2.0/site-packages/_xmlplus/xpath/XPathParserBase.py", line 62, in parse
    XPath.cvar.g_prodNum)
xml.xpath.XPathParserBase.SyntaxException: 
********** Syntax Exception **********
While parsing substring-after("2000YY10MM30DD", "MM")
Exception at or near "10"
  Line: 0, Production Number: 9

(YY, MM and DD represent Japanese characters \u5E74, \u6708 and
\u65E5, respectively.  They are encoded in the native encoding
in the error message, so I replaced the actual characters for
quotation.)

Actually, the second XPath expression is used in an XSL
stylesheet, but the same error raises.

What's wrong?  I wonder if I miss something trivial.

Thanks,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>