Desc of packages for XML processing

ankit ankitrastogi82 at gmail.com
Fri Dec 23 06:11:01 EST 2005


There are various packages availaible for XML processing using python.
So which to choose and when. I summarized some of the features,
advantages and disadvantages of some packages int the following text.
Have a look to it. May this get out of the dillema of choice.

Here we go:

OPTIONS
=========
- libxml2
- lxml
- Pyxml
- 4Suite



DESCRIPTION
=============


-------
libxml2
-------
A quote by Mark Pilgrim: "Programming with libxml2 is like the
thrilling embrace of an exotic stranger. It seems to have the potential
to fulfill your wildest dreams, but there's a nagging voice somewhere
in  your head warning you that you're about to get screwed in the worst
way."

	Features:
	=========
	  - Namespaces in XML
	  - XPath, Xpointer, XInclude XML Base
	  - XML Schemas Part 2 : DataTypes
	  - Relax NG
	  - SAX: a SAX2 like interface and a minimal SAX1 implementation
compatible
                    with early expat versions
	  - NO DOM:  It provide support for DOM to some extent BUT it does not

                    implement the API itself, gdome2 .
	  - It is written in plain C, making as few assumptions as possible,
and sticking
                     closely to   ANSI C/POSIX for easy embedding.
	  - Platform: Linux/Unix/Windows


	Advantages
	==========
	  - Standards-compliant XML support.
	  - Full-featured.
                 - Actively maintained by XML experts.
                 - fast. fast! FAST!
	 - Stable.

	Disadvantages
	=============
                This library already ship with Python bindings, but
these Python bindings have
                some        problems:
	   - Very low level and C-ish (not Pythonic).
	   - Underdocumented and huge, you get lost in them.
                   - UTF-8 in API, instead of Python unicode strings.
                    - Can cause segfaults from Python.
                    - Have to do manual memory management. As the
library calls are more or
                     less an exact mapping on the C API, and thus
require to think about
                    memory management

               For Those who want ot go for DOM API:
 	Packages for DOM
	================
	    - gdome2: gdome2 provides support for dom on top of
libxml2.C-Based
                     (http://gdome2.cs.unibo.it/)
	    - libxml2dom: Other option availabile is libxml2dom.

(http://cheeseshop.python.org/pypi/libxml2dom/0.3.3)
	    - libxml_domlib:libxml_domlib is a Python extension module that
enables you
                       to use the DOM  interface to libxml2

(http://www.rexx.com/~dkuhlman/libxml_domlib.html)


	Resources
	==========
	  - http://xmlsoft.org/index.html
 	  - http://codespeak.net/lxml/intro.html


----
lxml
-----
	lxml follows the ElementTree API as much as possible, building it on
top of the native libxml2 tree.

	Features
	========
	 - lxml provides all above features as of libxml2 but using
ElementTreet API.

	Advantages
	==========
	 - Pythonic API.
                 - Documented.
                 - Use Python unicode strings in API.
                 - Safe (no segfaults).
                 - No manual memory management


	Disadvantages
	==============
	- No DOM support as in libxml2.
	- It is in its initial release (latest is lxml 0.7)


	Resources
	=========
	- http://codespeak.net/lxml/


------
Pyxml
------
	Features
	=========
	 - xmlproc: a validating XML parser.
 	 - Expat: a fast non-validating parser.
 	 - sgmlop: a C helper module that can speed-up xmllib.py and
sgmllib.py by a
                  factor of 5.
 	 - PySAX: SAX 1 and SAX2 libraries with drivers for most of the
parsers.
	 - 4DOM: A fully compliant DOM Level 2 implementation
	 - pulldom: a DOM implementation that supports lazy instantiation of
nodes.
	 - marshal: a module with several options for serializing Python
objects to XML


	Advantages
	==========
	 - A lot of documentation is availaible and almost all resources and
examples
                   based on it.

	Disadvantages
	=============
	- No Schema support

	Pacakges for Schema(For those who want schema support too)
	===================
	XSV: currently in progress, and provides XML schema Part 1:
Structures.
	Dependent on some other pacakage PyLTXML
                (http://www.ltg.ed.ac.uk/~ht/xsv-status.html)




-------
4Suite
-------
	  Features:
	  =========
	  - XML,XSLT,XPath,DOM,XInclude,XPointer,XLink,XUpdate,RELAX NG,XML
                   Catalogs
	  - Platform: Posix, Windows

	  Advantages
	 ============
	  - As, this provides Relax NG: RELAX NG, a simple schema language for
XML,
                    based on [RELAX] and [TREX]. A RELAX NG schema
specifies a pattern for
                   the structure and content of an XML document.
	[1]
http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDAGDYR
	[2] http://xmlbuddy.com/2.0/features.html
	[3] http://www.xml.com/pub/a/2001/12/12/schemacompare.html?page=2

              * But Relax NG is not W3C based. It is provided by OASIS.


                 Site:
  	 ======
	  [4] http://cheeseshop.python.org/pypi/4Suite-XML/1.0b3




More information about the Python-list mailing list