Python and DB support (was: Re: Is Python Dead?)

Tue Jul 3 10:41:37 EDT 2001

"Zen" <zen at shangri-la.dropbear.id.au> wrote in message
news:3b4116e1$1 at mercury.its.rmit.edu.au...
> "Alex Martelli" <aleaxit at yahoo.com> writes:
>
> >Yep, the recordset part of ADO is undoubtedly the most popular single
> >feature of it among us lazy programmers.  You can even disconnect from
> >the DB and keep the recordset around, persist it to a file (maybe in
XML),
> >depersist it later... it's fun!-)
>
> I'm preparing a PEP to address a few issues with the Python DB API
> and an initial draft has been presented to the DB-SIG.
>
> I'm unfamiliar with ADO. When you retrieve a recordset, are all
> relevant tuples retrieved from the database (like the Cursor.fetchall()
> method in the DB API)? Or is it a lazy object that only downloads
> the rows when they are accessed by the application (like the
Cursor.fetch()
> or Cursor.fetchmany() methods in the DB API)?

Yes.  That is, you can program it either way, ot a bit of each
(see later).

> Can you provide examples of things you can do using the ADO interface
> (don't care what programming language) that are difficult to do using
> the Python DB API? Even if it is outside the scope of *this* PEP, I need
> to understand peoples wishlists to ensure they can be integrated
> in the future.

An ADO Recordset holds results from a datasource and implements
or delegates cursor behavior to the datasource.  Once the Recordset
has some data, you can keep working with the Recordset (even in an
independent way from the datasource) -- navigate in it, modify it,
persist and depersist it... at some later time you may decide to
connect to the datasource again, and have the Recordset send an
'update' to the datasource regarding the modifications you did to
the Recordset while the datasource was away.  It's *NOT* what I
would call a shining example of clean, minimalist, elegant OO
design.  It *IS* deucedly handy and seductive as soon as you start
getting used to it, though:-).

One example of systematical, handy though inelegant, redundance.
Basically all parameters to methods are optional.  If you do not
supply them, the object uses some of its properties as the
parameters.  You can query and change the properties at some
time, and later call the methods without parameters and they'll
do the right thing, or pass some parameters and have only some
others default.  Think of it as cheap and inelegant, but handy
and versatile, 'overridable currying'.

So take for example the Open method of a Recordset.  It has an
argument Source (or that may default from the Source property
of the Recordset) which can be a Command object (which can
encapsulate several different ways to fetch data) or a text
string which, together with option flags, can be implicitly
used to build the suitable Command object (an URL, an SQL
statement, a Stream object containing a persisted Recordset,
etc, etc).  An Options argument lets you specify several
things (again, you can use properties if you prefer) such
as the execution-and-fetching model:
adAsyncExecute    fully asynchronous
adAsyncFetch      synchronous for the first CacheSize rows,
                  then async for the rest (CS is a property)
adAsyncFetchNonBlocking    like adAsyncFetch, but the attempt
                  to read a row not yet fetched won't block
adExecuteNoRecords  no records are expected in response to
                  the command, the framework will throw away
                  any that might return (but shouldn't:-)
adExecuteStream   the results are packaged up as a Stream
                  object
adExecuteRecord   special-case: ONE, but only one, record is
                  returned as the result

Default is synchronous access, but the options, as you see,
are many.  Indeed perhaps TOO many -- not all apply in all
situations, always a bad sign (e.g., adExecuteStream with a
Source that is NOT a Command object executing is an error).

But back to the Open verb.  An argument ActiveConnection
lets you specify on what connection to open the recordset
(a Connection object, a string from which it can be
constructed, can default from the connection properties
of the recordset itself OR of the command object if you
have passed one as Source argument, etc).

And then you get to specify (still at Open time) what
kind of cursor-type and lock-type you desire.  Which is
a big topic in itself.  adOpenForwardOnly only lets you
move forward in the recordset, but is lightweight and
may afford optimization.  At the other extreme, one can
adOpenDynamic and see committed (or even uncommitted,
if that's the isolation level for your transaction)
changes made by other users while the cursor is open.

The transparent get recordset/disconnect/persist the
recordset/depersist it later/worh with the recordset
(client-side cursor of course!)/persist it again/&c
and later reconnect and send the update (as just a
delta).  All of the clientside software that displays
and edits the recordset doesn't even *know* whether
it's working on a live DB connection or in such a
disconnected edit-at-leisure scenario -- it's all
easily encapsulated.  If I had to choose one big
convenience feature, I think that would be it.

Maybe we can get such a level of convenience WITHOUT
adopting the "richly redundant architecture" that is
so characteristic of ADO... laziness apart, having to
do things in one specific sequence rather than any
which way isn't half bad (easier to document and
explain, not to mention code and test, for example).
But apart from such issues there IS real power in
such a concept as "a temporarily disconnected and
possibly persisted recordset" and the ability to
operate polymorphically on either THAT, or a live DB
connection.

> >> From this and also from my own experience with PostgeSQL queries from
> >> Python, I could say that DB support in Python exists, but need
enhancement
> >> and standartization. Probably, higher level more or less common
> >> object-oriented interfaces could help this.
>
> >It seems to me the (DB API 2) standard is fine (maybe a bit too
> >ambitious, so actual implementations may not have reached up
> >to it yet?-).
>
> The 2.0 driver implementations of the major databases seem fine to me.
> One problem I notice with database tools is extremly rare for someone
> to deal with more than three vendors RDBMS implementations, and generally
> people only develop with one or two.

True.  The specific DB-using product I currently consult for
(Think3's ThinkTeam PDM package) currently fully supports only
Oracle, SQL Server, and (ecch) Jet (aka "the Access DB") -- we've
more or less dropped Informix, Ingres &c as years went by.  But
my dad's office still works on Watcom SQL (on DOS 3.1...), and
of course one has to keep evaluating new releases of such things
as MySQL, Interbase, etc, so I try to keep current on more RDBMS's
than are strictly necessary for my job.  And emerging standards...

> >If I had a vote, I'd rather seen _Gadfly_ tuned up (out with regex
> >and regsub, in favour of re; minimal-tolerance support of stuff it
> >does not implement, such as NULLs) and made a part of Python.  It
>
> I agree that it would be a benefit and worth a PEP if the
> Gadfly author agrees and someone volunteers to do the
> required tweaking.

But would then Gadfly be included in the standard Python
distribution if so tuned-up and tweaked?  If it's to remain
an isolated module the motivation to go and clean it up
is an order of magnitude lessened, I think.  Having Gadfly
in the distribution would really enhance the "batteries
included" part of Python's appeal...!!!

Of course, it would have to be made clear in the docs that
Gadfly is great for small problems but that you can scale
up your programs to other (compatible) DB interfaces and
put a DB engine behind them, etc, etc.

> I don't see a point in including drivers in the core for databases that
> follow a different development cycle. Last thing we want is obsolete
> PostgreSQL drivers out there when a new version of PostgreSQL is
> released. Although PHP manages to do this.... can any PHP users comment?

I would hope major DB vendors keep compatibility in their
next releases with programs developed to their previous
API's...!  My objection would be another -- if there are
a dozen major RDBMS's and I, as a typical user, care about
1 or 2, why would I want to download the other 10 or so
at each Python upgrade?

Alex