[mgushee@havenrock.com: Re: [XML-SIG] Re: [Twisted-Python] Can anyone recommend a sensible XML parser for Python?]

Matt Gushee Matt Gushee <mgushee@havenrock.com>
Fri, 6 Sep 2002 11:41:33 -0600


Oops, accidentally sent this to Eron instead of the list. Here it is
again.

----- Forwarded message from Matt Gushee <mgushee@havenrock.com> -----

Date: Fri, 6 Sep 2002 11:12:45 -0600
From: Matt Gushee <mgushee@havenrock.com>
To: Eron Lloyd <elloyd@lancaster.lib.pa.us>
Subject: Re: [XML-SIG] Re: [Twisted-Python] Can anyone recommend a sensible XML parser for Python?
Reply-To: Matt Gushee <mgushee@havenrock.com>

Since this thread was apparently imported from another list, I'm missing
some of the context, but here goes ...

On Fri, Sep 06, 2002 at 12:36:12PM -0400, Eron Lloyd wrote:
> Hmm, I know that minidom has had some problems recently, but it has also
> seen some good improvements. It sounds like you need more robust DOM
> support--have you tried 4DOM? It's not as fast,

That's an understatement.

> but it does adhere to
> the spec the best. Maybe (when you have time) if you let us know what
> you expect to accomplish we can help out--the people in XML-SIG are some
> of the sharpest in the community. Perhaps TREX or RELAX-NG would be more
> suitable.

I don't follow that at all. First of all, he says he doesn't want
validation. But even if the greater flexibility of RELAX NG made
validation useful to him, RELAX NG hasn't been implemented in Python. As
for TREX, it has been merged into RELAX NG, so it is de facto, if not
formally, deprecated. So you want him to implement RELAX NG in Python,
*and* rewrite the XHTML schema in RELAX NG? I don't think so.

Unfortunately I don't have much good news to contribute. 4DOM might work
better, but you should be aware that it is essentially a legacy product.
Fourthought, Inc., which created it, is no longer developing it, because
its performance was horrible and there was simply not a huge demand for
a full DOM implementation. In fact, I worked for Fourthought for a year
and never once touched 4DOM. cDomlette, also from Fourthought, is the
fastest Python DOM parser (because it's a C extension), provides the
most commonly needed features, and will continue to be maintained for
the foreseeable future. Unfortunately, it's not quite ready for
production use, but depending on your timeline you might want to give it
a try (it's part of 4Suite, available at http://4Suite.org).

-- 
Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com
http://www.havenrock.com/

----- End forwarded message -----

-- 
Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com
http://www.havenrock.com/