Temporal Databases (Database design questions)

Wed Jun 18 08:31:18 EDT 2008

On Wed, Jun 18, 2008 at 11:16 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-06-18 09:41, David wrote:
>>
>> Question 3: Temporal databases
>>
>> http://en.wikipedia.org/wiki/Temporal_database
>>
>> I haven't used them before, but I like the idea of never
>> deleting/updating records so you have a complete history (a bit like
>> source code version control).
>>
>> How well do temporal databases work? Do RDBMS (like Postgresql) need
>> add-ons to make it effective, or can you just add extra temporal
>> columns to all your tables and add them to your app queries? Does this
>> increase app complexity and increase server load a lot?
>>
>> Are there Python libraries which simplify this? (eg: add-ons for
>> Elixir or SQLAlchemy).
>>
>> Or should apps all implement their own 'temporal data access' module,
>> which transparently uses the current date & time until queried for
>> historical data?
>
> You can have complete history in a database schema by:
>
> * adding a version column
> * adding a modification timestamp (and modification username,
>  if that's relevant for you)
> * updating the version upon INSERT and UPDATE
> * have a history table for each "live" table that gets
>  filled using a trigger on the version column which moves
>  the inserted/updated/deleted rows into the history table
> * the history table will have to have an additional column
>  for storing the method of how the row got into the table
>  (ie. insert/update/delete)
>
> The main app will only use the "live" tables with the current
> data. An audit tool would then provide access to the history
> tables.
>
> This works for all databases that have triggers, even SQLite.
>

Thanks for your reply.

How do you maintain foreign key references with this approach?

eg, you have these 4 tables:

table1
 - id
 - field1
 - field2
 - field3
 - version
 - modified

table1_history
 - id
 - field1
 - field2
 - field3
 - version
 - modified
 - updatemethod

table2
 - id
 - table1_id
 - field1
 - field2
 - field3
 - version
 - modified

table2_history
 - id
 - table1_id
 - field1
 - field2
 - field3
 - version
 - modified
 - updatemethod

Should table2_history.table1_id point to table1.id, or table1_history.id?

If table2_history.table1_id points to table1.id, then you will have
problems with when you remove records from table1.

If table2_history.table1_id points to table1_history.id, then you need
to make a table1_history entry for the new values.

Also, when you start updating records in table1, then table2 and/or
table2_history will still be pointing to old table1 records, instead
of the new value.

What this probably means, is that whenever you make any changes to
records, then:

1) Make a history entry for the record (as you described)

2) Also make new history entries for all records that depend on the
record that was updated, and for their sub-dependencies too,
recursively (even if those dependent records weren't themselves
updated). The new history record foreign keys should always point to
other historical records.

There may be cases where you can skip adding redundant records to the
history tables. But if your primary keys (in history tables) are
auto-incrementing integers, then the foreign keys (in the dependant
history tables) will all need to update in a 'cascading' way (so they
all point to records which are correct for that point in time).

This would probably also be a problem for regular temporal databases,
unless they have some built-in 'snapshot all foreign dependencies'
function.

How is this normally handled?

One method (stealing idea from git) would be for historical tables to
use 'hash' values for primary and foreign keys. The 'hash' (for
primary key) would be calculated from the other values in the record.
Foreign keys for dependent then also become hash strings, pointing to
the correct parent record.

David.