How to search HUGE XML with DOM?

Diez B. Roggisch deets at nospam.web.de
Fri Mar 31 06:51:01 EST 2006


> the xml.dom.minidom object is too slow when parsing such a big XML file
> to a DOM object. while pulldom should spend  quite a long time going
> through the whole database file. How to enhance the searching speed?
> Are there existing solution or algorithm? Thank you for your
> suggetion...

I've told you that before, and I tell you again: RDBMS is the way to go.
There might be XML-parsers that work faster - I suppose cElementTree can
gain you some speed - but ultimately the problems are inherent in the
representation as DOM: no type-information, no indices, no nothing. Just a
huge pile of nodes in memory.

So all searches are linear in the number of nodes. Of course you might be
able to create indices yourself, even devise a clever scheme to make using
them as declarative as possible. But that would in the end mean nothing but
re-creating RDBMS technology - why do that, if it's already there?

Maybe there are frameworks out there that support you in this, but the very
nature of XML makes that for sure a more tedious task than just defining a
simple SQL-Schema. If I'd have to search for some XML-tools that go beyond
DOM, I'd go for uche ogbuji's 4suite as a starter and work my way down from
there - maybe AMARA is what you need?

Now having said that: I'm not a SQL-bigot. Just use the right tool for the
job.

Regards,

Diez



More information about the Python-list mailing list