How to search HUGE XML with DOM?

Ivan Vinogradov vinogri at mcmaster.ca
Fri Mar 31 12:00:25 EST 2006


On 31-Mar-06, at 11:17 AM, bayerj wrote:

> Mind, that XML documents are not more flexible than RDBMS.
>
> You can represent any XML document in a RDBMS. You cannot represent  
> any
> RDBMS in an XML document. RDBMS are (strictly spoken) relations and  
> XML
> documents are trees. Relations are superior to trees, at least
> mathematically speaking.
>
> Once you have set up your system in a practicable way (e.G. not  
> needing
> to create a new table via SQL Queries for a new type of node, which
> would be a pain) SQL is far superior to XML.
>
> Anyway, cElementTree seems to be the best way to go for you now. Its
> performance is untopped by any other python xml library, as far as I
> know.
>
> -- 
> http://mail.python.org/mailman/listinfo/python-list

If I may hijack this thread for a bit, I'd like to dig deeper into  
this issue :)

Currently my simulation program produces an XML log file with events  
represented as nodes.
Often those files grow to multiple GB size. I like this setup because  
the format is open
and easily parse-able with a variety of tools. So I have a bunch I  
scripts that can analyze
different aspects of the simulation.

I have not much clue about databases, except that they exist,  
somewhat complex, and often
use proprietary formats for efficiency. So any points on whether RDBM- 
based setup
would be better would be greatly appreciated.

Even trivial aspects, such as whether to produce RDBM during the  
simulation, or convert the complete XML log file into one, are not  
entirely clear to me. I gather that RDBM would be much better suited  
for analysis, but what about portability ? Is database file a  
separate entity that may be passed around?

Apologies if this seems like a selfish question, perhaps consider it  
a full disclosure, different set-ups/examples would be appreciated as  
well.

--
Cheers, Ivan




More information about the Python-list mailing list