[XML-SIG] Re: [reportlab-users] The fastest XML parser around

Kevin Jacobs jacobs@darwin.epbi.cwru.edu
Tue, 2 Apr 2002 07:35:15 -0500 (EST)


On Tue, 2 Apr 2002, Alexandre wrote:
> On Tue, Apr 02, 2002 at 06:51:48AM -0500, Kevin Jacobs wrote:
> > Congrats on the new XML parser!
> >
> > On Mon, 1 Apr 2002, Andy Robinson wrote:
> > > pyRXP constructs a tree of tuples in memory with a single API call; the tree
> > > is easy to navigate in standard Python code and can be wrapped up with
> > > DOM-like 'lazy accessor' nodules.
> >
> > Why bother with lazy accessors?  If you are willing to consider using some
> > of the new Python 2.2 features, you can get all the speed and efficiency of
> > tuples with a true DOM interface.
>
> I doubt you'll get as low a memory footprint.

The overhead is typically per-class not per-instance (nodes, in this case),
so I don't see why.  The major gain is achieved by not allocating
per-instance dictionaries, which can be done quite handily with Python 2.2's
new-style-object slots.

> > Also, is it fair to make comparisons with
> > other parsers, since it doesn't look like RXPpy computes tag sets to answer
> > queries like getElementsByTagName or getElementById efficiently?
>
> It's probably not fair, but then, so what ? If pyRXP doesn't do what you
> need, just forget it, and use whatever parser you feel like using. I
> personnaly have never ever used getElementsByXXX, so this is not an
> important functionnality for me. OTOH, being able to load and manipulate
> in memory a 10MB XML document is something I'd like to do which is
> awkward with current DOM implementation I'm aware of.

I believe it is important to be clear about limitations and caveats when
making comparisons, especially when comparing apples and oranges.  Of course
the applicability of pyRXP is undiminished by such qualifications, and if it
gets the job done for you, all the better.  Personally, I'm not into
advocacy and will be looking into using pyRXP for things that it does well.

-Kevin

-- 
----------->  Kevin Jacobs  <-----------|------->  (216) 986-0710  <--------
Informatics Consultant                  | Department of Epidemiology
Primary mail:   jacobs@theopalgroup.com |   & Biostatistics
Alternate mail: jacobs@darwin.cwru.edu  | Case Western Reserve University
----------------------------------------------------------------------------