[Q] Databases -- Gadfly vs Metakit vs SQLite

Sun Mar 16 03:04:32 EST 2003

achrist at easystreet.com wrote:

> I'm looking for a simple database for a single-user Windows desktop
> application.  The user's data will be text, maybe 10-20 data tables,
> with identifying keys and a few short fields plus plenty of long
> text memo-type fields.  I should like to plan for databases up to
> about 50k rows and 100 MB of data, but the typical will be far less
> than this.
> 
> It looks like Python gives me 3 good choices for this, Gadfly,
> SQLite, Metakit.  I'd like comments that compare and contrast these
> for such an application.

I see you've already received good answers, and I'd just like to
stress one factor: by far the best approach is to *benchmark* the
components under consideration for some simple simulation of your
intended workloads -- "typical", "maximum reasonable", "unreasonably
heavy but we don't want to break outright".

Benchmarking might take an unreasonable amount of effort in some
cases, but with Python that's not really true -- in fact benchmarking
will encourage you to write a simple "database-independence layer"
that exists in several implementations with the same interface,
and have all of your application-level code use that layer, which
is a *VERY* good thing.  Strewing database-engine-dependent code
all over the place is something that happens VERY often (because
it looks "convenient"...) unless you take specific steps to help
yourself avoid it -- and developing to a "db independence layer"
helps a LOT.

> This is a desktop productivity app, so I'd like to get snappy response,
> no 1 or 2 second delays for a single update or retrieval.  This should
> work to such a standard on a user's 300 MHz, 64 MB Windows machine.

Then this is the machine you should be running benchmarks on, and
you already know the targets for what you consider acceptable
performance -- this puts you WAY ahead of most projects at this
stage, where it's never very clear at the outset (in my experience)
whether "one second" counts as "snappy" or whether we need to deal
with typical machines sold today (over a GHz CPU, 128 MB or more
often 256 MB of DDR or rambus, latest and resource-hungry versions
of Win/XP) or legacy machines with a resale value under $100 such
as the one you describe.

> Does Gadfly need to fetch the entire database into RAM?  This would be
> fine for me 99% of the time, but I don't want to worry about the user
> with the big database and the small machine.

Yes, this constraint is present (at least per-table -- not sure if
per-database also still applies, but if "not worrying" is the criterion
then that may not matter).

> Other criteria:
> 
> 1. Very high reliability -- Which of these can give us zero database
> failures from 1000's of users?

I _think_ (but have no hard data) this holds for them all.

> 2. Simple -- We want to hide all details of the database from the
> end-users, and easy to program wouldn't hurt either.  It looks like
> all three will meet this criteria.  Any traps?

Ditto.

> 3. Storage efficiency -- We don't want 10 MB of user data to take
> 100 MB of disk space.  I know that disk is cheap, but we want the
> data to be easy to back up, send,  and transport.

I wouldn't worry about that when a simple zip or gzip will no
doubt help you solve it smoothly anyway.

> The hierarchical non-SQL aspects of Metakit are probably OK.

Then, as I see somebody else already said, BSD-DB (latest version)
should perhaps also be in your set of candidate DB components.
But take care: relational flexibility may save your bacon at
some point down the road...

> Can anyone with experience with more than one of these advise?

Guess I shouldn't have answered, since I haven't in fact deployed
end-user applications with ANY of these embedded-DB-engines.  But
I think the "benchmark with a DB-independence layer" applies to
them just as well as it does to "heavy" engines such as PostgreSQL
or MS SQL Server...

Alex