[DB-SIG] DB-API 1.1

M.-A. Lemburg mal@lemburg.com
Mon, 15 Jun 1998 11:03:13 +0200


Greg Stein (Exchange) wrote:
> 
> > From: M.-A. Lemburg [mailto:mal@lemburg.com]
> > Sent: Friday, June 05, 1998 1:54 AM
> >
> > Tod Olson wrote:
> > >
> > > >>>>> "M" == M -A Lemburg <mal@lemburg.com> writes:
> > >
> > > M> Things to be considered obsolete:
> > > M>      arraysize
> > > M>      setinputsize()
> > > M>      setoutputsize()
> > >
> > > M>      Completely agree on punting these... fetchmany() would then
> > > M>      have a mandatory argument instead of an optional one.
> > >
> > > I actually found arraysize to be useful when I implemented array
> > > binding in a module I was working on.  I could allocate the
> > space for
> > > the array binding when the user set the variable and avoid doing a
> > > malloc and free on every call to fetch*.  That is, it was a
> > nice hint
> > > for handling memory in a place where Python couldn't help me.
> 
> This is exactly the reason for the introduction of these attributes /
> methods.
> 
> > But arraysize is only intended as default argument for fetchmany()
> > -- what if the user decides s/he only wants arraysize/2 or arraysize*2
> > rows by passing an argument to fetchmany() ?
> >
> > BTW: You can still optimize malloc/free's by reusing the arrays
> > for every fetchmany()-call. Calling realloc normally isn't that
> > costly.
> 
> Nope. No can do. You want the size *before* calling execute(). For the most
> efficient array operation, you want to bind your input and output arrays,
> then prepare the statement (the prepare is done inside execute()). Finally,
> you start fetching results.

I'm not sure where that get's you any added performance: you are
aiming at 'execute one big SQL statement and make that as fast
as possible' (right ?), so the scenario could be handled like this:

· the execute method looks at the length of the argument list and
  builds an array from that *before* binding and executing the statement
  itself
· next, fetchmany(size) is called; this allocates size packets of
  output columns and then loads the data from the database; the next
  call to fetchmany(size) will reuse the already allocated output
  array; freeing of the array is done whenever a new execute is done
  or the cursor is closed

Of course, leaving the 3 API in the spec won't hurt anyone, so we
might as well keep them (we would have to provide dummies for them
anyway -- just to be backward compatible).

> > Return values for DDL, DML and DQL statements.
> > 
> >       The only really useful return value defined is the
> >       number of rows affected for DMLs. mxODBC provides two
> >       new attributes for cursors:
> >       rowcount - number of rows affected or in the result set
> >       colcount - number of columns in the result set
> >       We drop all the special return values and add these two
> >       (or maybe just rowcount) attributes instead.
> 
> Having the execute() function return an integer allows the execute() to be
> an atomic operation (for DMLs). Placing return values into attributes breaks
> this.

Hmm, I wanted to make the interface design a little easier for
the programmer. The problem is figuring out what type
of statement the execute method was passed. For mxODBC, I came up
with this hack:

    /* Figure out the return value and return */
    if (dbcs->colcount == 0) {
	if (dbcs->rowcount == 0)
	    /* must be a DDL like CREATE */
	    return PyInt_FromLong(1L);
	else
	    /* must be a DML like UPDATE */
	    return PyInt_FromLong(dbcs->rowcount);
    }
    else
	/* must be a DQL like SELECT */
	Py_ReturnNone();

How about this: we leave the return codes as they are, but instead
of defining them via DML, DDL, DQL we use the above scheme (it should
be equivalent, but produces less headaches on the programmers side).
Plus, we add rowcount as attribute.

Do all supported databases provide an API for rowcount after an
execute ?

>  Multiple results sets.
> > 
> >       Same problem as with stored procedures. There doesn't
> >       seem to be much of a standard and most DBs probably don't
> >       even support these.
> 
> I think these snuck in with 1.1; I don't recall the nextset() method at all.

They did ;-)

> > Mapping of database types to Python types.
> > 
> >       Since different databases support different types (including
> >       some really exotic ones like PostgreSQL), I think the
> >       general idea should be: preserve as much accuracy as you
> >       can e.g. if a money databse type doesn't fit Python's integers,
> >       use longs. The only type debatable, IMHO, is what to use
> >       for long varchar columns (LONGS or BLOBS). The most common
> >       usage for these is probably storing raw data, just like you
> >       would in a disk file. Two Python types come into play here:
> >       arrays (mutable) and strings (immutable). Strings provide
> >       a much better access interface and are widely supported, so
> >       I'd vote for strings.
> 
> Raw columns were returned as dbiRaw instances so that they could be fed
> right back into an INSERT statement. In the dbiRaw.value, you would find
> that native string object, however.

That will have to be specified though.

> The thing that we did want to avoid, however, is an exhaustive mapping
> within the API definition (currencies and the like). These would probably
> follow the dbi model, but each database handles these speciality types a bit
> differently. It becomes very cumbersome on the module implementor if you try
> to unify these types within the API. We felt it was a bit better to punt
> unification up to the Python level if it was required.
> 
> > In general having more of the API return dedicated types with a
> > nice OO interface would be a really nice thing, but I fear that
> > most interface programmers are not willing to hack up two or
> > three other types in addition to connections and cursors.
> 
> Major, big-time point. We had three modules that we wanted to build
> ourselves, not to mention modules that others wanted to build. Keeping the
> module slim and "close to the metal" was easier on us, and more in tune with
> Python's viewpoint on exposing APIs thru modules (e.g. expose the natural
> interface, then place Python layers on that as necessary).

Right. The OO layer can easily be built on top of cursors and from
what I've heard there are some of these layers just waiting to be
announced ;-)

> > 2. We should come up with a reasonable package structure for database
> >    access, e.g. to give the discussion a starting point:
> > 
> > [Database]
> >       [<name of database>]
> >               [dbi]
> >       [<name of database>]
> >               [dbi]
> > 
> > You'd then write:
> > 
> > from Database.Oracle import *
> > db = Connect('...')
> > c = db.cursor()
> > c.execute('...',dbi.dbiDate(...))
> > 
> > When porting to another database, only 'Oracle' would have to changed
> > to the other DBs name (in the ideal case ;-).
>
> The intent of the dbi module was that only one version existed on the
> planet. e.g. the db-sig itself owns it and the module implementors reference
> it, but don't redefine it.

I think reality got it the other way around ;) which isn't that bad,
since the interface programmer has more freedom that way. There could
still be a generic dbi module though... the above scheme doesn't
necessarily imply that dbi has to be included by the database package
(it could simply be an imported generic submodule).

> p.s. some later emails discussed dictionaries being passed to Connect(). I'd
> like to point out that this becomes difficult for the user to specify in a
> one-liner. Keyword args is pretty nice, but a bitch and a half to deal with
> from the C side of things. Allowing a database-defined set of args or a
> string is easiest on the module implementor. This variance between Connect()
> calls is also the reason that Michael Lorton specified that the import and
> connection to a database is specific, while the rest of the API can be
> generic; therefore, the use of the module name as the connection function.
> [ it is assumed as a fact that each database has different parameter sets
> and definitions for a connection. ]

Great idea. It maintains backward compatibility
with multi-argument Connect() implementations while still providing
a dictionary like interface.

BTW: Handling keywords isn't really necessary for the C module. Simply
wrap the C module into a Python module and define the Connect()
function there. It can then preprocess the arguments and pass the
results to a low-level connect function defined in the C module. [I use
this technique a lot and with good results, e.g. you can prototype
interfaces in Python and then move them down one level when things
have settled.]

I'll have an edited version of the 1.1 spec up online in a few
days and then ring in the second round...

Thanks for all your comments.

-- 
Marc-Andre Lemburg                               Y2000: 564 days left
---------------------------------------------------------------------
          : Python Pages >>> http://starship.skyport.net/~lemburg/  :
           ---------------------------------------------------------