[DB-SIG] Controlling return types for DB APIs

Robert Brewer fumanchu at amor.org
Tue Apr 17 19:35:10 CEST 2007


Federico Di Gregorio wrote:
> Il giorno mar, 17/04/2007 alle 10.16 +0200, M.-A. Lemburg ha scritto:
> > 
> > While this can be solved using a registry of types conversions,
> > I see problems in standardizing the way to define the type
> > mappings since different database backends tend to have
> > or need different types. 
> 
> I can see an API that leverages on the introspection abilities of the
> drivers, to abstract to the different type representations of the
> various backends.

Perhaps, but "type" and "representation" are two different concepts. For
example, I've got an SQL Server DB (whose schema I can't change) which
stores most dates in proper DATETIME columns, but there are 3 or 4 which
store dates in CHAR(8) columns in 'YYYYMMDD' format. Although dbtype may
*imply* pytype and vice-versa, there will always be cases where the
adaptation layer between the two is context-dependent. Solving for
arbitrary SQL (like "WHERE mytable.yyyymmdd_birthdate > now()")
therefore means:

 1. Knowing the desired Python type for each column (datetime.date),
 2. Knowing the actual database type for each column or subexpression
(SQL Server's CHAR type),
 3. Having inbound and outbound scalar transformers
(datetime.date-to-YYYYMMDD and datetime.date-from-YYYYMMDD),
 4. Knowing which binary and comparison operations have implicit
conversions, and
 5. Special-casing binary and comparison operations between types which
have no implicit conversions.

For example, my YYYYMMDD adapter/converter has the method:

    def compare_op(self, op1, op, sqlop, op2):
        if isinstance(op2.dbtype, sqlserver.DATETIME):
            # Cast the YYYYMMDD string to a DATETIME.
            sql = ("((CASE WHEN ISDATE(%s)=1 "
                          "THEN CAST(%s AS DATETIME) "
                          "ELSE NULL END) %s %s)"
                   % (op1.sql, op1.sql, sqlop, op2.sql))
            return sql
        return "(%s %s %s)" % (op1.sql, sqlop, op2.sql)
    binary_op = compare_op


This is where Geniusql is headed. I'm not for a moment saying the DBAPI
should go that far, but there needs to be a clear understanding of
exactly how far the DBAPI is going to go down this rabbit hole (because
however far you go, your user base will forever pester you for the next
level of flexibility ;).

> Let's suppose that a driver "knows" the type of a DB
> column, then we can ask it for an abstract "dbtype":
> 
> dbtype = connection_object.getdbtype("SELECT 1 AS foo")
> 
> where the query _must_ return a scalar from which the driver 
> infers the type. Then the type can be used as a key in the
> registry.

Given the extremely small number of datatypes that each commercial
database exposes, this seems to be both more work and less accurate
results than simply modeling each concrete dbtype directly. All
SQL92-compliant types can be fully described with a handful of
attributes (bytes, precision, scale, whether each of those is
user-specifiable, and if so the maximum allowed value for each, whether
a numeric type is signed or unsigned, and finally the CHAR vs VARCHAR
distinction). [1]

> Obviously the conversion function will be backend-
> specific but I suppose the signature could be the same for
> all functions. Given the fact that the conversion happens
> inside a cursor and than the connection is available from the
> cursor object itself, something like:
> 
> py_data = conversion_function(backend_data, cursor_object)
> 
> Then we can at least make a standard for the registry methods.

There should be some provision for custom converters (which forces you
to stick the converters on each column object instead of in a registry,
since there can be several different converters for e.g.
datetime.date-to-CHAR).

But even if you decide not to go that far, the registry of default
converters will need to be keyed by (pytype, backend-specific dbtype).
For example, Postgres has a hard time comparing FLOAT4 and FLOAT8 [2],
not to mention that the concrete precision of SQL92 REAL and DOUBLE are
"implementation defined". It's not entirely hopeless; some base classes
for converters can be constructed [3].


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

[1] See
http://projects.amor.org/geniusql/browser/trunk/geniusql/dbtypes.py for
my mostly-finished crack at this, plus any module in
http://projects.amor.org/geniusql/browser/trunk/geniusql/providers for
concrete DB types. Note I stick default_pytype directly on both abstract
and concrete dbtype objects, but an external registry would be just as
easy.
[2] ...because the implicit conversion isn't always what you want; see
http://archives.postgresql.org/pgsql-bugs/2004-02/msg00062.php for an
example.
[3] See
http://projects.amor.org/geniusql/browser/trunk/geniusql/adapters.py


More information about the DB-SIG mailing list