[XML-SIG] 4XPath and Unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Sun, 10 Dec 2000 19:41:32 +0100


> 1) Move all XPath parsing to another technology, perhaps Spark
> (http://www.csr.uvic.ca/~aycock/python/).  Pro: it's in Python and
> should be easy to maintain.

I hope I can find some time to write an XPath parser in YAPPS. Is
there some readily-readable grammar for XPath? I find the bisongen
input of 4Suite extremely hard to read.

I think the time would not be wasted to evaluate different parser
toolkits in that application. I have the feeling that XPath is
sufficiently simple put together a parser in any of these toolkits; we
could then evaluate speed and readability of the generator input.

> Con: we might lose performance, and most Python scanner/parser
> packages seem to be only sporadically maintained.  For instance,
> Spark's last update (0.6.1) was in April.  We'd like to avoid being
> stuck maintaining a parser package in addition to everything else.

As for performance: Most of it probably comes from the lexing speed;
with sre, I hope that we can perform comparable to flex.

If 4Suite (and perhaps PyXML) made an educated selection for a parser
generator toolkit, that may set sufficient precedence of establishing
a standard, and getting the author of the toolkit interested in
improving it.

Furthermore, these things normally don't need much maintainance -
bison is still in wide use, even though it is not maintained anymore.

> 2) Use an existing Python package for lexing, for instance mxTextTools. 
> Pro: should be easier to convert and maintain.  Con: performance?
> encoding support?

I'd discourage yet another C module. It is *very* unlikely that they
get reasonable Unicode support.

> 3) Write our own scanner in Python using SRE.  We'd probably have one
> Python code to tokenize and then write a shell in C to feed the tokens
> to Bison.  This would ensure best performance.  Pro: performance, we get
> to add all the encoding support we want directly.  Con: maintainability.

Also, this is exactly what all these parser toolkits do - I don't
think there is need for yet another one.

Regards,
Martin