[Tutor] Database acces vs XML

Erik Price erikprice at mac.com
Mon Sep 29 14:46:24 EDT 2003


 
On Monday, September 29, 2003, at 08:16AM, Guillermo Fernandez <guillermo.fernandez at epfl.ch> wrote:

>I'm working with a large file of network logs (a file of 25Mb or 150000 
>lines). What I do is parse the file, identify the fields that interest 
>me most, put them on lists and then play with the lists to print out the 
>information I want.
>
>This takes 5 minutes and eats a lot of RAM (150000 list elements is 
>quite a lot :-) so I though about creating a database with all the fiels 
>I want. Thinking about the database I'm studying the possibility of 
>using a classical MySQL database or using an XML document as a database 
>using XPath. The XML possibility would allow me to run my program in any 
>computer with python installed and eliminates the need of database 
>installation and management.
>
>My questions are:
>1- Would I gain in speed and RAM with the use of a database?

Yes, but if you are using a relational database server such as PostgreSQL or MySQL, then you have the overhead of the database itself, which may occupy additional RAM etc.  Plus there is usually a certain level of installation issues to work through, unless you have a database already installed and you will simply be creating a new DB instance.

>2- If it's the case, would XML resist the search over 150000 entries?

XML parsing isn't super hard, but it's not easy unless you're working with some library or other that abstracts away all of the details into programmer-friendly API (for instance, the Jakarta Digester libraries, but that's Java and not Python).  For one thing, if you build an in-memory tree of the logs using a DOM-based or other in-memory -based model, you will probably incur a similar or perhaps even greater overhead.  However, if you implement a SAX listener and simply parse through the file and perform callbacks as you do so, then you will use very little RAM.  But this would really only be useful if your logs were already in an XML format, or if you were converting the logs from their original format to XML.

In all, the XML-based solution doesn't require a database, but can be more work to implement in the end depending on how complex your needs are.  And a DOM-based solution will not spare you much RAM since it uses the same model as your current approach.

>3- How much would I loose in speed with an XML document compared to the 
>MySQL database?

You can't get numbers for this easily without your own benchmarking.  It's too specific to your problem/hardware.

Perhaps the best approach would be to write some kind of parser (not necessarily XML-based) to read through your log files in the same fashion as a SAX parser, performing callbacks as it goes, without actually reading everything into memory.  If each entry is on a separate line, you can use the xreadlines method of the file object to do this.



Erik



More information about the Tutor mailing list