[XML-SIG] Python wrappers for libxml and libxslt
Dave Kuhlman
dkuhlman@cutter.rexx.com
Fri, 13 Jul 2001 13:29:15 -0700
I've implemented wrappers for the parser in libxml2 and simple
wrappers for the top level functionality in libxslt. You can learn
more about libxml and libxslt at:
http://xmlsoft.org
And you can find my Python wrappers at:
*** Caution -- This is alpha-ware. Use at your own risk. ***
SAX interface:
http://www.rexx.com/~dkuhlman/libxml_saxlib.html
http://www.rexx.com/~dkuhlman/libxml_saxlib-1.0a.tar.gz
DOM interface:
http://www.rexx.com/~dkuhlman/libxml_domlib.html
http://www.rexx.com/~dkuhlman/libxml_domlib-1.0a.tar.gz
XSL-T:
http://www.rexx.com/~dkuhlman/libxsltmod.html
http://www.rexx.com/~dkuhlman/libxsltmod-1.0a.tar.gz
Thanks so much to all those who work made this possible. Thanks to
the people who did libxml and libxslt (these modules are 99.9%
their work and 0.1% mine.) Thanks for Distutils, which made it so
easy to package these modules. And, thanks to the core Python team
for a great and extensible language.
My wrappers are at a pretty low level (i.e. close to the libxml C
code). That made it a bit easier for me. But it might also help
with speed and memory use considerations for some uses.
But, it also turns out to be very easy for a Python user of the
wrappers. With libxml_saxlib, just create an instance a class that
has methods like startDocument, endDocument, startElement,
endElement, characters, etc, then call parse_file(instance,
fileName) or parse_string(instance, string). With libxml_domlib,
call parse_file or parse_string to parse the document, then call
getRootElement, getFirstChild, getNextSibling, etc to walk the
tree. With libxslt, just call a function or two.
An additional educational part of this work -- In providing access
to the DOM tree, I needed to implement several Python extension
datatypes (as part of the Python extension module libxml_domlib).
I had never done that before, believing that the Python C
structures involved were too difficult for me to deal with. With
some help, it turned out to be not as difficult as I thought. Here
are two suggestions if you need to implement a Python extension
type yourself:
- Start by copying Objects/xxobject.c in the Python source code
distribution. The structure and organization in this file will
put you far ahead of where you would be if you start from
scratch and it will save many errors, too.
- Or, use the my extension datatype generator. You can find it at:
http://www.rexx.com/~dkuhlman/dtGenerator.py
For restricted purposes, this will save a lot of copy, paste,
and rename work.
You may be asking, Why did you implement XML capabilities for
Python, when we already have PyXML/4Suite? PyXML is super. And
there is no way that these wrappers for libxml/libxslt can be
considered anywhere near as good as PyXML. (It's presumptuous for
me to suggest that they are comparable.) However, let me give a
couple of reasons for doing and offering this:
- Because it's there. libxml2 and libxslt are available.
- Because implementing the Python extension modules and extension
datatypes was good training for me.
- Because I believe that having a bit more breadth of coverage of
something as important as XML is good for Python, even if it is
not used very much.
- Because it's easy to use. Using libxsltmod from Python is
(almost) as easy as one function call. It won't give enough
control for some situations. But where that control is not
needed, calling from Python is very easy.
- Because there may be special situations where this
implementation is useful. For example, installing it on a new
machine, may be as easy as copying a few shared libraries. For
some purposes, that may be a benefit.
- Because I'm grateful for all that the Python community has
given me and I'd like to try to give a little back.
If you have suggestions or find problems please let me know.
- Dave
--
Dave Kuhlman
dkuhlman@rexx.com