[XML-SIG] Roadmap document - finally!

Ken MacLeod ken@bitsko.slc.ut.us
20 Feb 2001 08:06:29 -0600


Lars Marius Garshol <larsga@garshol.priv.no> writes:

> | A low-level Infoset API would be interesting
> 
> Personally I would prefer to see a nice tree-based XML API. My
> personal opinion is that the DOM stinks and needs replacement.  Sean
> McGrath's xTree looks far better, in my opinion.

Orchard[1] exposes *just* the infoset in the simplest possible way[2]
(that is, an element's attributes is a mapping, contents are
sequences, other attributes are simple values).

Orchard's nodes differ from DOM nodes in that they have no navigation
methods or attributes (firstChild, nextSibling) or DOM-special
manipulation (insertBefore, replaceChild) -- depending solely on
Python's standard mapping and sequence interface.  Orchard also uses a
(URI, LocalName) tuple for supporting XML Namespaces, instead of
additional *NS methods.  Like Python's DOM binding, Orchard uses
normal attribute accessors instead of (or in addition to) get/set
methods.

Essentially the whole API (the XML node attributes for common XML
nodes), in language-neutral form, less a few convenience methods like
getElementsByTagName(), load(), and save(), is attached below.

>From a quick re-review, Pyxie's xTree also has navigation methods (Up,
Down, HasUp).  I would be very interested to find out if people have a
preference for navigation methods vs. using the mappings and sequences
directly.  Again, Orchard nodes use direct access, no navigation
methods.

Like Pyxie's xDispatch (and discussed here earlier[3,4]), Orchard uses
node-based events/dispatch (SAX).  Event handlers, pull modules, or
dispatch functions all use the same node types as trees do.

"But Wait!!  That's not all!"  :-)

As a last note, the C optimization is well underway.  Orchard/Mostly-C
is about 3-10x faster than pure Python/Perl while still retaining
attribute accessors (with overrides), garbage collection, and no
problems with cycles.  Current status is that we have a pure Python
prototype of the Orchard APIs, and the Python binding is scheduled for
early post-1.0 (as always, volunteers can change that!).  We have
ported Matt Sergeant's XPath step evaluator to C as an example of C
optimization for higher language modules[5].

  -- Ken

[1] <http://casbah.org/~kmacleod/orchard/>
[2] <http://casbah.org/~kmacleod/orchard/quick.html#XMLNodes>
[3] <http://mail.python.org/pipermail/xml-sig/2000-February/001905.html>
[4] <http://mail.python.org/pipermail/xml-sig/2000-February/001907.html>
[5] <http://casbah.org/~kmacleod/orchard/xpath.moc.txt>

Orchard's common XML nodes:

      document  element         attribute       characters
      --------  --------------  --------------  ----------
      contents  name            name            data
      root      attributes      value
                contents        namespace-uri*
                namespace-uri*  local-name*
                local-name*     prefix*
                prefix*

      * Available when namespace processing is enabled (the default).

    The `contents' property of a document or element node is a list of
    the nodes within that document or element.  The `name' of an
    element or attribute node is name of the element/attribute,
    including prefix, if any.

    The `root' of a document is the root element of the document.

    An element's `attributes' is a container indexed by the
    attribute's `name' property.  The `value' of an attribute is the
    normalized, string value of the attribute.

    The `data' of a characters node is XML text.

    *** XML Namespaces

    If an XML document uses XML Namespaces, the following additional
    properties are available on element and attribute nodes.

    `namespace-uri' is the XML Namespace URI string.  `local-name' is
    local-name portion of the element name (the element name without
    the prefix).  `prefix' is the prefix portion of the element name
    (the element name without the local-name).

    The `attributes' container is indexed also by the
    namespace-uri/local-name pair of each attribute. When accessing
    documents using XML Namespaces, you should only use the
    namespace-uri/local-name indexes for attributes.

    XML Namespace processing is used by default if the document uses
    XML Namespaces.