[XML-SIG] Re: Re: cElementTree 0.8 (january 11, 2005)

Fredrik Lundh fredrik at pythonware.com
Thu Jan 13 21:35:18 CET 2005


Paul Boddie wrote:

> >     libxml2                     16000k  0.098s
> >     cElementTree 0.8             5700k  0.058s
>
> cElementTree looks really impressive, but having run various tests comparing
> libxml2 and cElementTree with some of the larger test documents in the
> libxml2 distribution, libxml2 still seems faster.

After chatting a little with people who've benchmarked cElementTree and
other toolkits on a variety of platforms, I think the general conclusion seems
to be that both libraries can parse most stuff in about the same number of
milliseconds.  The main differences seems to come from 1) compilers, and
2) what Python version you're using (2.4 can be a lot faster).

My benchmarks all use "official" binary distributions, and I have no idea what
compilers the other developers have used.  Nor has an ordinary user, of course.
If people want better results for their favourite toolkit, they should release better
binaries ;-)

> I've used GNU time to report things like the elapsed, system and user times
> as well as measuring the elapsed time in Python, but I couldn't get the memory
> usage.

My test harness is basically:

    import stuff
    raw_input("check process size")
    t0 = time.clock() # use time.time() on unix
    parse(file)
    t1 = time.clock() # see above
    print t1 - t0
    raw_input("check process size")
    clean up

where the process size is checked in the usual way, and the "memory used by
the dom" is the difference between the two values.

To check for anomalies, I also run the above in a loop (minus the raw_input
calls), and watch how performance and memory use vary over time.  Some
toolkits are extremely unstable, timewise (GC issues?).  And I run the tests
several times over a day, to make sure the system load doesn't impact too
much.

Benchmarking stuff is always hard, and when you're dealing with things that
take 0.0-0.2 seconds *and* consume lots of memory, it's even harder.  When
comparing such benchmarks from different machines, you better use a rather
large fudge factor...

> Still, cElementTree looks like a very promising addition to the range of
> Python XML tools, especially given the uncomplicated installation process
> (compared to some of the other top performers, notably libxml2 and
> cDomlette).

As someone just pointed in private mail, libxml2 may be on par on the parsing
side, but since cElementTree creates *Python* objects, it has a major advant-
age over libxml2 once you start digging into the tree from Python.  cElement-
Tree doesn't have to create any proxy objects; everything you can reach is al-
ready a Python object.

But sure, it's hard to beat libxml2 if you want both speed *and* support for
every XML standard you've ever heard of (and then some)...

</F> 





More information about the XML-SIG mailing list