ANN: More XML support for Python etc

Dave Kuhlman dkuhlman at rexx.com
Thu Aug 2 12:12:26 EDT 2001


I've implemented wrappers for the parser in libxml2 and simple
wrappers for the top level functionality in libxslt.  You can learn
more about libxml and libxslt at:

    http://xmlsoft.org

And you can find my Python wrappers at:

   *** Caution -- This is alpha-ware.  Use at your own risk. ***

    SAX interface:
        http://www.rexx.com/~dkuhlman/libxml_saxlib.html
        http://www.rexx.com/~dkuhlman/libxml_saxlib-1.0a.tar.gz

    DOM interface:
        http://www.rexx.com/~dkuhlman/libxml_domlib.html
        http://www.rexx.com/~dkuhlman/libxml_domlib-1.0a.tar.gz

    XSL-T:
        http://www.rexx.com/~dkuhlman/libxsltmod.html
        http://www.rexx.com/~dkuhlman/libxsltmod-1.0a.tar.gz

You can also find a link to the above from http://xmlsoft.org. 
Look under "Contributions".

Thanks so much to all those who work made this possible.  Thanks to
the people who did libxml and libxslt (these modules are 99.9%
their work and 0.1% mine.) Thanks for Distutils, which made it so
easy to package these modules.  And, thanks to the core Python team
for a great and extensible language.

My wrappers are at a pretty low level (i.e. close to the libxml C
code).  That made it a bit easier for me.  But it might also help
with speed and memory use considerations for some uses.

But, it also turns out to make things easy for a Python user of the
wrappers.  With libxml_saxlib, just create an instance a class that
has methods like startDocument, endDocument, startElement,
endElement, characters, etc, then call parse_file(instance,
fileName) or parse_string(instance, string).  With libxml_domlib,
call parse_file or parse_string to parse the document, then call
getRootElement, getFirstChild, getNextSibling, etc to walk the
tree.  With libxslt, just call a function or two.

An additional educational part of this work -- In providing access
to the DOM tree, I needed to implement several Python extension
datatypes (as part of the Python extension module libxml_domlib). 
I had never done that before, believing that the Python C
structures involved were too difficult for me to deal with.  With
some help, it turned out to be not as difficult as I thought.  Here
are two suggestions if you need to implement a Python extension
type yourself:

  - Start by copying Objects/xxobject.c in the Python source code
    distribution.  The structure and organization in this file will
    put you far ahead of where you would be if you start from
    scratch and it will save many errors, too.

  - Or, use the my extension datatype generator.  You can find it
    at:

        http://www.rexx.com/~dkuhlman/dtGenerator.py

    For restricted purposes, this will save a lot of copy, paste,
    and rename work.

You may be asking, Why did you implement wrappers for XML
capabilities for Python, when we already have PyXML/4Suite?  PyXML
is super.  And there is no way that these wrappers for
libxml/libxslt can be considered anywhere near as mature and rich
as PyXML.  (It's presumptuous for me to suggest that they are
comparable.) However, let me give a couple of reasons for doing and
offering this:

  - Because it's there.  libxml2 and libxslt are available. 

  - Because implementing the Python extension modules and extension
    datatypes was good training for me.

  - Because I believe that having a bit more breadth of coverage of
    something as important as XML is good for Python, even if it is
    not used very much.  XML capablilities available from multiple
    sources gives Python more credibility.

  - Because it's easy to use.  Using libxsltmod from Python is
    (almost) as easy as one function call.  It won't give enough
    control for some situations.  But where that control is not
    needed, calling from Python is very easy.

  - Because there may be special situations where this
    implementation would be useful. For example, installing it on a
    new machine, may be as easy as copying a few shared libraries. 
    For some purposes, that may be a benefit.

  - Because I'm grateful for all that the Python community has
    given me and I'd like to try to give a little back.

If you have suggestions or find problems please let me know.

  - Dave

-- 
Dave Kuhlman
dkuhlman at rexx.com



More information about the Python-list mailing list