[XML-SIG] xml vs. sql

Tue, 8 Jan 2002 00:40:15 -0500

[Michael Mell]

> I'm considering building a data-centric website using xml instead of the
> usual SQL database. I expect the site will receive about 1000 hits per
> day. There will never be more than a couple thousand records in the
> data. I'm planning on using Apache, mod_python and Python's minidom.
>
> The benefits of doing without the database are:
>     1. the data will be highly portable
>     2. the data will not be subject to corruption
>     3. no need for DB admin
>
> Potential downsides:
>     1. Slow processing under load
> Any thing else I need to consider?
>
> Anyone care to share past experiences doing the same?
>

I would say that a lot depends on the data model and the nature of the
queries you will be doing.  For complex queries needing multiple joins, I'd
probably want to use a relational database.  For simple extraction from
tables, XML could work well.

It also depends on the size of the table-equivalents.  If they are small,
processing with xml could be fast.  You could even cache the tables and
stylesheets (if that's what you were to use) as DOM structures.  If they are
"too" large - find out by testing - a database might be better.

Also, you haven't mentioned if the data will change a lot or just be static.
If there were going to be a lot of changes to the data, I'd be interested in
a database with transaction capability.

SQL data can be pretty portable, especially if the database can do a dump in
that uses SQL insert statements to save the data.  And I wouldn't say the
data were less subject to corruption if stored as XML.

I just did a job where I initially stored html form data as lines of text in
files (I was keeping the database server separate from the web form
machinery, so I had to store it first before loading it into the database),
used xml/xslt and batch files to create python scripts that inserted the
data into an SQL database (the data forms were created using xml templates,
so I could compute nearly everything else I needed using xslt) and created
other Python scripts with xslt that ran reports using stored queries in
Zope.  I was happy I could use a relational database because it made it easy
to create the queries for the reports.

As for pickles, I have always understood (perhaps wrongly) that the pickle
format is subject to change for each new version of Python.  If so, to use
pickles you have to ensure that you can recreate the pickled data, so you
still need some way to store it - why not xml?

So it depends.  I'd say, try it in straight xml first (modulo my first two
remarks above) and see if it's fast enough and that the xpath or xslt isn't
too complex to maintain easily.  If that's not good enough, consider going
to Zope and any convenient database it can talk to (like mySQL or whatever).
That would be pretty portable.  Or just talk to mySQL or another database
directly from Python, especially if you don't need a web interface.

Cheers,

Tom P