[XML-SIG] XPath in Python 2

Paul Prescod paul@prescod.net
Sun, 09 Jul 2000 11:45:08 -0500


Python is delayed and we don't know how long it will be so. It would not
hurt to look at the feasibility of adding XPath support while we wait. A
decent XPath implementation should work on any DOM implementation and
probably would not require changes to the implementation. In other
words, XPath implementation should not interfere with our readiness to
build Python 1.6 (or 2.0) at the drop of a hat. If we finish it and test
it before the beta period starts, then we could put it in.

Why XPath? XPath is the W3C-provided mechanism for navigating XML
documents in a declarative way. That means that rather than specifying
an exact path to a node, you describe the relationship between the node
you are on and the node you want to get to. This makes the creation of
complex applications much easier and allows for more efficiency "under
the hood" of the XPath implementation.

The two basic features of an XPath implementation are selecting and
matching. Selecting takes a node or nodelist and an XPath and returns a
nodelist of related nodes. Matching takes a node and an XPath and
returns a boolean if the node "matches" the nodelist (i.e. if there is a
node in the document such that the nodelist succeeds).

We already have two XPath implementations: PyXpath and 4XPath. PyXPath's
code is very complicated and I have had trouble in the past optimizing
it and using it for matching (as opposed to selecting). I don't know if
anyone is interested enough to go in and add those features.

4XPath is cleaner from a user's point of view, but it requires a lot bit
of C/lex code for parsing the XPaths. I don't know if we would have to
go back to the BDFL to get permission for that code to go into Python.

We also have the option of creating a new XPath implementation also. The
primary virtue of doing so would be the opportunity to implement a tiny
subset of XPath in a much smaller amount of code. The two existing
implementations probably have more code than the rest of the Python 1.6
XML package. And in 4XPath's case, a lot of that is C code.

---

My feeling is that implementing 10% of XPath in 10% of the code would
get us 80% of the benefit. Those that need the rest can download 4XPath.
I also think that 4XPath should be part of the pyxml distribution.

The 10% that is most interesting:

	* a/b/c
	* a//b
	* ../

Actually, that's probably not even 10% and it can be "parsed" mostly
with a "string.split" on "/". Things like positional predicates can be
implemented with Python sequence syntax. Attribute access can use DOM
syntax. All in all, this looks like an afternoon's work, if we agree
that it should go into Python.

-- 
 Paul Prescod - Not encumbered by corporate consensus
"Computer Associates is expected to come in with better than expected 
earnings." Bob O'Brien, quoted in
	- http://www.fool.com/news/2000/foth000316.htm