[XML-SIG] Roadmap document - finally!
Ken MacLeod
ken@bitsko.slc.ut.us
20 Feb 2001 08:06:29 -0600
Lars Marius Garshol <larsga@garshol.priv.no> writes:
> | A low-level Infoset API would be interesting
>
> Personally I would prefer to see a nice tree-based XML API. My
> personal opinion is that the DOM stinks and needs replacement. Sean
> McGrath's xTree looks far better, in my opinion.
Orchard[1] exposes *just* the infoset in the simplest possible way[2]
(that is, an element's attributes is a mapping, contents are
sequences, other attributes are simple values).
Orchard's nodes differ from DOM nodes in that they have no navigation
methods or attributes (firstChild, nextSibling) or DOM-special
manipulation (insertBefore, replaceChild) -- depending solely on
Python's standard mapping and sequence interface. Orchard also uses a
(URI, LocalName) tuple for supporting XML Namespaces, instead of
additional *NS methods. Like Python's DOM binding, Orchard uses
normal attribute accessors instead of (or in addition to) get/set
methods.
Essentially the whole API (the XML node attributes for common XML
nodes), in language-neutral form, less a few convenience methods like
getElementsByTagName(), load(), and save(), is attached below.
>From a quick re-review, Pyxie's xTree also has navigation methods (Up,
Down, HasUp). I would be very interested to find out if people have a
preference for navigation methods vs. using the mappings and sequences
directly. Again, Orchard nodes use direct access, no navigation
methods.
Like Pyxie's xDispatch (and discussed here earlier[3,4]), Orchard uses
node-based events/dispatch (SAX). Event handlers, pull modules, or
dispatch functions all use the same node types as trees do.
"But Wait!! That's not all!" :-)
As a last note, the C optimization is well underway. Orchard/Mostly-C
is about 3-10x faster than pure Python/Perl while still retaining
attribute accessors (with overrides), garbage collection, and no
problems with cycles. Current status is that we have a pure Python
prototype of the Orchard APIs, and the Python binding is scheduled for
early post-1.0 (as always, volunteers can change that!). We have
ported Matt Sergeant's XPath step evaluator to C as an example of C
optimization for higher language modules[5].
-- Ken
[1] <http://casbah.org/~kmacleod/orchard/>
[2] <http://casbah.org/~kmacleod/orchard/quick.html#XMLNodes>
[3] <http://mail.python.org/pipermail/xml-sig/2000-February/001905.html>
[4] <http://mail.python.org/pipermail/xml-sig/2000-February/001907.html>
[5] <http://casbah.org/~kmacleod/orchard/xpath.moc.txt>
Orchard's common XML nodes:
document element attribute characters
-------- -------------- -------------- ----------
contents name name data
root attributes value
contents namespace-uri*
namespace-uri* local-name*
local-name* prefix*
prefix*
* Available when namespace processing is enabled (the default).
The `contents' property of a document or element node is a list of
the nodes within that document or element. The `name' of an
element or attribute node is name of the element/attribute,
including prefix, if any.
The `root' of a document is the root element of the document.
An element's `attributes' is a container indexed by the
attribute's `name' property. The `value' of an attribute is the
normalized, string value of the attribute.
The `data' of a characters node is XML text.
*** XML Namespaces
If an XML document uses XML Namespaces, the following additional
properties are available on element and attribute nodes.
`namespace-uri' is the XML Namespace URI string. `local-name' is
local-name portion of the element name (the element name without
the prefix). `prefix' is the prefix portion of the element name
(the element name without the local-name).
The `attributes' container is indexed also by the
namespace-uri/local-name pair of each attribute. When accessing
documents using XML Namespaces, you should only use the
namespace-uri/local-name indexes for attributes.
XML Namespace processing is used by default if the document uses
XML Namespaces.