Dynamism and Database Referential Integrity

Kris J. Zaragoza kzaragoza at mediaone.net
Wed Jan 9 12:25:08 EST 2002


On 8 Jan 2002 10:32:16 -0800, Hung Jung Lu <hungjunglu at yahoo.com> wrote:
>I have a question on databases. If you ask a database
>designer/administrator, most of them will tell you that referential
>integrity is a must. However, I can't help but drawing parallel with
>Python's dynamism. Sure, if you asked C++/Java programmer that have
>never seen Python/Perl, they'll tell you that type-checking is a good
>thing, that type-safety helps to eliminate tons of bugs during compile
>time. All that may be true. But, in practice, a dynamic language like
>Python is time-proven to show the fallacy that a strong-typed language
>is a must. Can the same be said about database referential integrity?
>

If you'd like to make analogies with programming languages and
techniques, I would compare referential integrity to programming by
contract.  The referential integrity constraints that are enforced by
the database are there to ensure that accidents don't happen by
storing invalid data, much as the contract enforcement facilities in
Eiffel (or even asserts in other languages) help ensure that what is
passed into a function or method is valid.

>I say it because I've been involved in a project with a few database
>veterans that, to my surprise, dropped all referential integrity
>requirements: no foreign key constraints. The dynamism achieved is
>that you can version your data much more easily in the complex entity
>relationships: your program can work with versioned data: if a client
>submits old data, you work the old way, if client submits new data,
>you work the new way, and the database tables achieve more longevity,
>because versioning is built-in. New columns can be added...
>relationships can be added/removed in future versions, and the program
>will still work with the old versions, etc. Well, things seem to have
>worked out pretty well.

I would argue that this is the exception rather than the norm.  In the
case of the database being used exclusively by one application, all of
the appropriate constraints can be enforced by the application itself.
Any problems with the data point to a bug in the application and can
be dealt with accordingly.  In the hands of experts, this technique
can radically improve performance and flexibility.  Then again, only
experts will really know when this is appropriate.  In most of the
cases where I've seen developers argue against transactional controls
or constraints, their arguments were easily torn apart.

When you have multiple applications accessing one database, however,
you don't want a bug in one app causing all the others to fail because
of bad data.  Here is where the various features of modern relational
databases really start to shine.  Referential integrity along with
other types of constraints help keep data consistent.  Transactional
controls likewise ensure that applications only ever see properly
saved, consistent data.

I work by a simple rule of thumb:  Always use transactions and
constraints.  It will never really hurt you, and it will save your
bacon when you least expect it to.


-- 
Kris J. Zaragoza           | On the face of it, Microsoft complaining about
kzaragoza at mediaone.net     | the source license used by Linux is like the
                           | event horizon calling the kettle black.
                           | -- Adam Barr, article on kuro5hin.org



More information about the Python-list mailing list