[DB-SIG] RFC: Higher level DB-API objects

Kevin Jacobs jacobs@penguin.theopalgroup.com
Tue, 12 Feb 2002 10:11:11 -0500 (EST)


Hello everyone,

As many of you have already heard at the Python-10 conference, my company
has implemented a very comprehensive set of extensions on top of the DB-API
interface.  We plan to contribute much of this to the Python community in
the form of an Open Source (tm) project.  However, there are a great many
issues involved in taking a proprietary set of extensions and making them
palatable to general audiences.  So I thought I would start the discussions
in the Python DB-SIG, since many of the DB-API module implementers are
likely to hang out here.

My goal is not to make the existing DB-API objects fatter.  They're
pretty much the right size already.  On the other hand, I have not ruled out
proposing such changes, provided they make sense for the community at large
and will be supported by DB-module authors.

That said, you're probably wondering what I have in mind.  I'll start by
explaining what we've already developed for our own in-house use.

For the past three years, I've been working on a database abstraction
framework I (imaginatively) call DBLIB.  It abstracts a great deal about
various database backends into a single programatic interface.  The major
features are:

  1) A complete SQL92 parser and dialect processor that can translate many
     queries and SQL expressions from one SQL dialect to another.
     Currently supported are some basic translations between
        o Oracle
        o MySQL
        o PostgreSQL
        o Microsoft SQL-Server 2000
        o SAPDB
        (plus a few others)

     NB: This is more of a framework than a complete working product.  It
         translates what we have needed for our past projects, and there are
         a huge number of additional transformations possible.  This
         component is a classic candidate for being turned into an Open
         Source project, since many contributors will significantly enhance
         its usefulness.

  2) A distributed transaction manager framework.  The goal of this
     component is to provide reliable logging of transactions that occur
     over multiple non-atomic operations over multiple backends.  This
     logging makes it possible to detect and possibly apply user-specified
     functionality to recover from failed (i.e., partially committed) distributed
     transactions.

  3) A connection management and abstraction layer on top of DB-API objects
     to map DSN strings to backend driver instances, with intelligent
     support for various connection pooling methods.

  4) A new DB-API abstraction object that hooks into the connection manager,
     the distributed transaction manager, and the SQL parser and dialect
     translator that allows extremely backend-neutral code to be written.
     e.g., you can write Python code that works with MS-SQL, PostgreSQL,
     MySQL, Oracle, etc. with virtually no modifications.  Changing from one
     backend to another is simply a matter of adjusting the DSN string to
     select another driver.

  5) Meta-query objects: query algebra objects that allow programatic
     manipulation of queries.  These objects support all of the usual
     relational-algebraic operations, and can flatten themselves into
     SQL with nested-subqueries or a sequence of SQL statements using
     temporary tables (to work around backends that cannot perform
     sub-selects).

  6) A query processor and optimizer, in some ways very much like Gadfly.
     Its major use is to take queries that join multiple tables from
     multiple disjoint backends and perform the query in a distributed
     fashion.  This involves factoring SQL queries into blocks that can be
     executed in parallel on multiple backends, and then aggregating the
     results and remaining query operations.  The result set produced is as
     if it were run on a single relational query backend.

These and other components are currently implemented in Python and are used
in several very large business applications for several >$100M companies.
Some of the code for the above features cannot be released, some cannot be
released in its current form, some can easily be exported and be useful
immediately.  I _could_ get out the ole' chainsaw and gut the portions of
our code base that I am free to release.  However, I'm not sure of the
utility of handing out disjointed, incomplete and proprietary sets of
functions and objects.

Rather, I'd like to start discussions on how to make this code, or at least
the ideas embodied by it, accessible to the general Python community.  I
welcome any help that the members of this SIG have to offer and look forward
to the ensuing discussions.  Eventually, I'd like to find a core group of
developers to help write a comprehensive and unified high-level database
library and business object interfaces that will significantly enhance
Python's ability to compete with other languages and technologies.

I welcome any comments and suggestions, and will be happy to go into more
detail on some of the issues I have raised.

Best regards,
-Kevin Jacobs

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com