[Pycon2005-attendees] The Database Divide

Andy Dustman farcepest at gmail.com
Fri Mar 25 14:57:30 CET 2005


The classic UNIX filesystem model has a table of inodes, with
allocated inodes corresponding to files. The inode table space is
flat: There is no notion of hierarchy. Indeed, there is no concept of
file names. Directories, which are a special type of file, are
essentially lists of tuples of (filename, inode). This provides both
file names and hierarchy, since directories may point to other
directories. The inode number is best thought of as a pointer. You can
have multiple names pointing to the same inode. The name of a file
depends on where you are looking at it in the hierarchy.

Python objects are not so different. Objects which sit in an address
space, and may point at multiple other objects. Likewise, no object
has a name, except what is imposed by some hierarchy. Some objects do
have a __name__ attribute, but from the object's perspective, this is
just a pointer to some other object called __name__.

One difference between Python objects and RDB tables is that Python
objects can (but not always) contain arbitrary attributes, which can
be of arbitrary types, but tables have a fixed number of columns of
fixed types. (IIRC, PostgreSQL may allow you to specify a column type
which can contain any column type.) Python's ability to assign
attributes arbitrarily is a great feature. OTOH, the rigid column
schema of relational databases is also a great feature, if your goal
is to do summations and counts and such, which is common in business
applications.

Relational databases are not a great place to store arbitrary Python
objects, although it's possible to do this by stuffing pickles in a
BLOB; it's just not generally a good idea. However, if you have a
fairly fixed schema, then you can think of each table as being an
address space for a single object class. Then references to other
object classes are still just pointers, but into other address spaces.
This make more sense if you think of the primary key of being a serial
number type (or an auto_increment in MySQL) or row number, and then
refering to objects in other tables by their primary key/row number.

The biggest mismatch is still that your Python objects may point to
instance of arbitrary classes. In a relational database, this is like
having a column with a foreign key reference, except that it might
refer to a key in an arbitrary table for each row, which you just
can't do.

Time for the keynote.
-- 
Computer interfaces should never be made of meat.

Using GMail? Setting Reply-to address to <> disables this annoying feature.

You are in a maze of twisty little passages all alike.
To go north, press 2. To go west, press 4.
To go east, press 6. To go south, press 8. 
If you need assistance, press 0 and a little dwarf
will assist you.


More information about the Pycon2005-attendees mailing list