[XML-SIG] Python wrappers for libxml and libxslt

Dave Kuhlman dkuhlman@cutter.rexx.com
Fri, 13 Jul 2001 13:29:15 -0700


I've implemented wrappers for the parser in libxml2 and simple
wrappers for the top level functionality in libxslt.  You can learn
more about libxml and libxslt at:

    http://xmlsoft.org

And you can find my Python wrappers at:

   *** Caution -- This is alpha-ware.  Use at your own risk. ***

    SAX interface:
        http://www.rexx.com/~dkuhlman/libxml_saxlib.html
        http://www.rexx.com/~dkuhlman/libxml_saxlib-1.0a.tar.gz

    DOM interface:
        http://www.rexx.com/~dkuhlman/libxml_domlib.html
        http://www.rexx.com/~dkuhlman/libxml_domlib-1.0a.tar.gz

    XSL-T:
        http://www.rexx.com/~dkuhlman/libxsltmod.html
        http://www.rexx.com/~dkuhlman/libxsltmod-1.0a.tar.gz

Thanks so much to all those who work made this possible.  Thanks to
the people who did libxml and libxslt (these modules are 99.9%
their work and 0.1% mine.) Thanks for Distutils, which made it so
easy to package these modules.  And, thanks to the core Python team
for a great and extensible language.

My wrappers are at a pretty low level (i.e. close to the libxml C
code).  That made it a bit easier for me.  But it might also help
with speed and memory use considerations for some uses.

But, it also turns out to be very easy for a Python user of the
wrappers.  With libxml_saxlib, just create an instance a class that
has methods like startDocument, endDocument, startElement,
endElement, characters, etc, then call parse_file(instance,
fileName) or parse_string(instance, string).  With libxml_domlib,
call parse_file or parse_string to parse the document, then call
getRootElement, getFirstChild, getNextSibling, etc to walk the
tree.  With libxslt, just call a function or two.

An additional educational part of this work -- In providing access
to the DOM tree, I needed to implement several Python extension
datatypes (as part of the Python extension module libxml_domlib). 
I had never done that before, believing that the Python C
structures involved were too difficult for me to deal with.  With
some help, it turned out to be not as difficult as I thought.  Here
are two suggestions if you need to implement a Python extension
type yourself:

  - Start by copying Objects/xxobject.c in the Python source code
    distribution.  The structure and organization in this file will
    put you far ahead of where you would be if you start from
    scratch and it will save many errors, too.

  - Or, use the my extension datatype generator.  You can find it at:

        http://www.rexx.com/~dkuhlman/dtGenerator.py

    For restricted purposes, this will save a lot of copy, paste,
    and rename work.

You may be asking, Why did you implement XML capabilities for
Python, when we already have PyXML/4Suite?  PyXML is super.  And
there is no way that these wrappers for libxml/libxslt can be
considered anywhere near as good as PyXML.  (It's presumptuous for
me to suggest that they are comparable.) However, let me give a
couple of reasons for doing and offering this:

  - Because it's there.  libxml2 and libxslt are available. 

  - Because implementing the Python extension modules and extension
    datatypes was good training for me.

  - Because I believe that having a bit more breadth of coverage of
    something as important as XML is good for Python, even if it is
    not used very much.

  - Because it's easy to use.  Using libxsltmod from Python is
    (almost) as easy as one function call.  It won't give enough
    control for some situations.  But where that control is not
    needed, calling from Python is very easy.

  - Because there may be special situations where this
    implementation is useful. For example, installing it on a new
    machine, may be as easy as copying a few shared libraries.  For
    some purposes, that may be a benefit.

  - Because I'm grateful for all that the Python community has
    given me and I'd like to try to give a little back.

If you have suggestions or find problems please let me know.

  - Dave

-- 
Dave Kuhlman
dkuhlman@rexx.com