[XML-SIG] The fastest XML parser around

Daniel Veillard veillard@redhat.com
Mon, 1 Apr 2002 03:34:12 -0500


On Mon, Apr 01, 2002 at 04:28:38AM +0100, Andy Robinson wrote:
> ReportLab (www.reportlab.com) are proud to announce the release of pyRXP,
> the fastest XML parsing toolkit for python, and possibly for any other
> language
> anywhere:
> 
>                      http://www.reportlab.com/xml/pyrxp.html
> 
> pyRXP is a wrapper around the excellent RXP parser developed by Richard
> Tobin at the University of Edinburgh.  Our goal is very simple:  get an
> entire
> XML document into memory, and validated, as quickly and efficiently
> as possible.   You can parse and validate Hamlet in a tenth of a second on a
> standard PC.

  Last time I checked (i.e. when I was still a W3C staff member and we asked
Henry Thompson and Richard Tobin to make RXP open source), the licence for
RXP was GPL.
   As far as I can tell it still is:
     http://www.cogsci.ed.ac.uk/~richard/rxp.html

  So it is hardly usable for any commercial or non-GPL framework.

> (non-validating) minidom parser in the standard Python distribution.  It
> also
> comfortably beats the Microsoft and Java (Xerces) parsers in our tests.

  Which are not the only parser out there. Libxml2 also ship with a Python
interface as part of the recent releases and apparently also beats both
Microsoft and Xerces (Java and C) parsers for raw parsing speed. So your
claim is a bit "light". Libxml2 is also released under the MIT Licence
making it suitable for any use.

  For the record, libxml2 parsing speed on a 1.2GHz Duron was:
     - 16 MBytes/s generating empty SAX callbacks
     - 6 MBytes/s when building a full DOM tree.
     - the conformance against the W3C/NIST regression test suite made
       of 1800+ documents is done as a Python script calling the libxml2
       bindings and takes approximately 3.5 seconds.

  Considering validating a stream, RXP may have an edge at the moment,
because libxml validation currently requires to build the DOM tree.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/