How to search HUGE XML with DOM?

Magnus Lycka lycka at carmen.se
Fri Apr 7 11:24:25 EDT 2006


Ivan Vinogradov wrote:
> I have not much clue about databases, except that they exist, somewhat 
> complex, and often use proprietary formats for efficiency.

Prorietary storage format, but a standardized API...

> So any points on whether RDBM-based setup
> would be better would be greatly appreciated.

The typical use case for RDBMS is that you have a number
of record types (classes/relations/tables) with a regular
structure, and all data fits into these structures. When
you want to search for something, you know exactly in what
field of what table to look (but not which item of course).
You also typically have multiple users who need to be able
to update the same database simultaneously without getting
in each others way.

> Even trivial aspects, such as whether to produce RDBM during the 
> simulation, or convert the complete XML log file into one, are not 
> entirely clear to me. 

Most databases as suited at writing data in fairly small chunks,
although it's typically much faster to write 100 items in a
transaction, than to write 100 transactions with one item each.

> I gather that RDBM would be much better suited for 
> analysis, but what about portability ? Is database file a separate 
> entity that may be passed around?

Who says that a database needs to reside in a file? Most
databases reside on disk, but it might well be in raw
partitions.

In general, you should see the database as a persistent
representation of data in a system. It's not a transport
mechanism.



More information about the Python-list mailing list