[XML-SIG] Python Wrapper for Xerces/Xalan

W. Eliot Kimber eliot@isogen.com
Thu, 12 Apr 2001 21:02:12 -0500


Nicolas Chauvat wrote:
> 
> > Also, while I'm thinking about it, we have modified the pyXML and the
> > 4Suite XSLT and XPath stuff to allow us to process arbitrary groves and
> > arbitrary Python objects with XSL. We would like to contribute our
> > changes back but haven't had the bandwidth yet to properly package
> > things up. But we will as soon as we can.
> 
> Are you talking about the same groves that jade, SGML and DSSSL use?

The very same. We are building a generic grove-based hyperdocument
management system. Given a grove implementation, we can both hyperlink
to it using a generic hyperdocument API (that looks a lot like HyTime
but is not dependent on the use of HyTime syntax--it can be bound to any
reasonable way of representing hyperdocuments, from HTML to Micrsoft
Project).
 
> What do you mean by "XSL and XPath to process arbitrary python objects"?

We discovered that with just a couple of lines of code, that we could
get the 4Suite XSLT processor to happily apply XSLT templates and XPath
expressions to arbitrary Python objects if we treat the object class as
the "tag name" and all member variables and parameter-less methods as
attributes. For example, our Hyperdocument object class has a
"getBosMembers()" method that returns the list of groves from which the
hyperdocument was constructed (the "bounded object set" of input
documents, converted to groves). 

Given our hack, I can do this in an XSLT style sheet (note: I am not by
any stretch an XSL expert, so please excuse any XSL errors or stupidity
in the following example):

<!-- "ext" namespace is our extension package for interrogating the
hyperdocument, which
     is passed in to the XSLT processor as part of its startup
invocation. -->
<xsl:template select=".">
  <xsl:if test="ext:isAnchorMember()">
    <!-- Given that the current node is a member of one or more
hyperlink anchors,
         get the Hyperdocument object from it and do something with it. 
      -->
    <xsl:param name="hydoc" select="ext:getHyDoc()"/>
    <xsl:for-each select="$hydoc"><!-- Probably a better way to do this
-->
      <p>Documents in this hyperdocument:</p>
      <ol>
      <xsl:for-each select="@getBosMembers">
        <li>Document of type: <xsl:value-of select="@ClassName"/></li>
      </xsl:for-each>
      </ol>
   </xsl:for-each>
</xsl:template>

Where the "ext:getHyDoc" extention method returns one of our
Hyperdocument objects. The expression "@getBosMembers" is internally
translated to a call to the getBosMembers() method of the Python object,
which returns a node list of grove nodes. As defined in the grove
standard, every grove node has a "ClassName" property that is the string
value of the class name, as defined in the grove's property set.

So, for example, say my hyperdoc consisted of one XML doc, one Word doc,
and one Excel doc, my output would look like this:

<p>Documents in this hyperdocument:</p>
<ol>
<li>Document of type: SgmlDocument</li>
<li>Document of type: WordDocument</li>
<li>Document of type: ExcelDocument</li>
</ol>

Where "WordDocument" and ExcelDocument are names defined in the Word and
Excel property sets that we have privately defined--I don't really
expect Microsoft to publish formal grove property sets for their
proprietary formats any time soon.

This hack required relatively little modification to the 4Suite code
base, although we haven't fully filled out the XPath expressions for
operating on groves.

I have submitted a paper on this work for the Extreme Markup conference
in August.

Cheers,

Eliot

-- 
. . . . . . . . . . . . . . . . . . . . . . . .

W. Eliot Kimber | Lead Brain

1016 La Posada Dr. | Suite 240 | Austin TX  78752
    T 512.656.4139 |  F 512.419.1860 | eliot@isogen.com

w w w . d a t a c h a n n e l . c o m