[DB-SIG] Remaining issues with DB API 2.0

Greg Stein gstein@lyra.org
Wed, 31 Mar 1999 18:36:17 -0800


M.-A. Lemburg wrote:
>...
> The differentiation between date, time and timestamp is needed
> simply because they point to three different informations. A single
> date constructor + corresponding type object is not enough for
> a module to decide whether to take only the date, the time, or
> both parts to format some data to pass to the database.
> 
> Take e.g. a module that uses literal SQL statement formatting
> for interfacing. Such a module would not be able to tell
> whether to format a date as time only value, timestamp or
> date only value. If it were to always pass in timestamp values,
> the database would probably raise an error for DATE and TIME
> columns.
> 
> So dropping the two constructors renders that kind of interface
> useless. Becides, explicitly using the constructors in you
> database interface script will surely make them more readable :-)

Your constructors aren't sufficient anyhow(!). As I pointed out once
before, MySQL has many more varieties than a simple DATE, TIME, and
TIMESTAMP. Hell, the TIMESTAMP has about six varieties.

However, I do believe that MySQL can always take a fully-qualified
date/time and will trim it back as necessary. This implies a single
constructor.

What is the case for ODBC? Do you actually need to *bind* these
differently? (which is the major reason to have these constructors --
input binding!)

> As for the type objects: we could group date/time values
> collectively under DATETIME. Would that be in your interest ?

This would certainly be better. I wouldn't really have a problem if we
said that a "column type" of dbmodule.DATETIME existed. There may be
many *Python* types which compare equal to that, but those subtypes are
at the discretion of the module implementor.

I also believe that if a particular database module requires additional
granularity for *input* types, then it will have additional constructors
beyond the 6-tuple and the ticks.

A real problem that we're having here is that we don't have a sufficient
review of the extensions that have been made for the different
databases. What have module implementors needed to do to get their
modules to work? (not just to operate nicer, but to *work*)

> > * the 'qmark' paramstyle should not be assumed. All 2.0 DBAPI modules
> > must specify the paramstyle attribute (along with apilevel and
> > threadsafety). Further, the 'qmark' style is not the best style. Named
> > or positional are best, because they allow you to use the same parameter
> > multiple times without requiring the user to respecify it over and over.
> > For example: "select column_a1, column_b1 from a, b where a.key = :1 and
> > b.key = :1". If anything, we should recommend not using the qmark style
> > if a module implementor has a choice.
> 
> True. I'll take those comments out.

I would like to see a stated preference for 'numeric', 'named', and/or
'pyformat'. They are marginally clearer, and easier to use.

> > * a note needs to be added to execute() and executemany() to state that
> > parameters should not require any preprocessing before being passed. To
> > the client programmer, the interface should appear as if a value is
> > going to be "bound" to the input, rather than substituted into the query
> > string.
> 
> Not sure, why you want this. The spec already says:
> "Parameters may be provided as sequence or
>  mapping and will be bound to variables in the operation."

I see that, but people have still written substitution-based modules
rather than binding-based modules. Can we have a footnote on the word
bound, that reads: "The term "bound" refers to the process of binding an
input value to a database execution buffer. In practical terms, this
means that the input value is directly used as a value in the operation.
The client should not be required to "escape" the value so that it can
be used -- the value should be equal to the actual database value."

(edit the text as required and/or others may have suggested edits)

> > * I do not understand the difference between OperationalError and
> > ProgrammingError. The examples seem to divide the two exceptions
> > arbitrarily. What is the "rule" for deciding which is which?
> 
> OperationalErrors are all types of errors that are related
> to things that are not necessarily under the control of the
> programmer, e.g. lost connections, log full, transaction
> could not be performed due to lack of memory or lack of
> functionality, etc.
> 
> ProgrammingError refer to things that are under the programmers
> control, e.g. he uses a table that does not exist, he specifies
> the wrong number of parameters for a statement, uses wrong
> SQL syntax, etc.

All right. How about this: under "OperationalError" it states "data
source name not found". Whose fault is that? The programmer for entering
a typo'd DSN, or the administrators for not configuring it?

Under ProgrammingError, it states one case is a "table not found." That
isn't the programmer's fault cuz somebody deleted the table. Oh, but
maybe it is because he misspelled the table name.

Oh, and how is the InternalError different than OperationalError? Not my
fault the cursor died on me.

If I pass an integer that is too large, is that a ProgrammingError or a
DataError?

The differences here are too fine-grained to be usefully delineated in
the specification. We should probably cut the number of exceptions in
half.

> Hmm, maybe OperationalError wasn't the right choice for
> the dynamic .rollback() error after all...
> 
> There is some overlap and in fact some DBs raise ProgrammingErrors
> for certain things while others use OperationalError. Maybe we
> should make one a subclass of the other... ?

I'd say we make two subclasses of Error: InterfaceError, and
DatabaseError.

> > * I still do not understand the presence of the rowcount attribute.
> > *Especially* since the execute() method now says "return value
> > undefined." That's bogus. Let's nuke the rowcount attribute and return
> > *that* value from execute. No reason to do an execute and then go
> > somewhere else for this information.
> 
> The "return value undefined" was done to allow the .execute()
> method return the old style return value (which is not easily
> implementable for various reasons I already pointed out in
> previous mails). The module implementor is free to return
> whatever he likes.
> 
> The rowcount definition is somewhat different from the 1.0
> return value spec of .execute(). That's why it's a new attribute.
> Again, module implementors could go and simply have .execute()
> return whatever the value of .rowcount would be if they like...

If we make it undefined, then we are stating that in DBAPI 2.0, it
cannot be used (without consulting the db-specific doc). In other words:
somebody has to update their client program.

If they are going to update their program, then let's make execute()
return something rational.

> > * for the execute() return value (merged from the rowcount thing), we
> > should discriminate between "I know the number and it is X" and "I have
> > no idea". The latter should be None. Note that -1 no longer applies
> > since the value is only available as part of an execute, rather than a
> > random-access. I agree with your earlier comments about needing to
> > remove the bit about detecting DML vs DDL vs DQL. I like your current
> > text, with the additions of "0 or None for DDL statements" and "None for
> > cases where the count cannot be determined".
> 
> The problem with None is that it compares false which would
> be ok for DDL (0 or None both are false), but fails to indicate
> "don't know" for DQL where 0 could be a possible known row count.

If people are looking for None, then they can use "is None" rather than
"not".

true-ness / false-ness is obviously not the appropriate way to
discriminate the values.

result = curs.execute(op)
if result is None:
  # database doesn't know how many rows were returned/affected
elif result == 0:
  # no rows returned/affected
else:
  # some positive number of rows were returned/affected

For DDL: a database will return 0 or None (whichever is convenient)
For DML: a database will return >=0 or None (if it can't determine the
number of rows affected... at one time, PostGres couldn't)
For DQL: a database will return >=0 or None (if it can't determine the
number of rows returned until a fetchXXX() is performed)

A footnote with the above three lines would be a handy "implementation
hint" for developers. I believe it also fits in with your need to not be
required to parse the statement (which I agree is something to avoid!).

> As for the random-access vs. return value: some DBs could be
> able to calculate the row count only if at least one fetch was
> done. So the information might not be available right after the
> execute.
> 
> How about defining the return value .execute() as being the
> current .rowcount attribute's value as you propose and provide
> the .rowcount as additional, possibly optional feature ?

Drop rowcount altogether. We don't need multiple ways to access the
value.

> > * the comment in fetchone() about the underlying cursor should be
> > removed. The Python programmer can never access that thing, so the
> > comment is irrelevant. The comment about using array fetches and whatnot
> > still apply -- in particular, the possible use of the arraysize
> > attribute.
> 
> The comment is simply a warning to those who might expect the
> fetchone() method to move the cursor by one row. This may be
> true for some interfaces (e.g. mxODBC) while being false for others.

Understood, but why does the Python programmer care about this? It has
no bearing on the interface since they have NO access to that cursor.
The fact that it does or does not move forward N rows is moot -- the
Python programmer only uses the DBAPI interface which has no "movement"
semantics.

Could this note be moved under the implementation hints?

> > * in all methods that state "an exception is raised..." we should state
> > *which* exception. We have a complete set defined at the beginning, so
> > this makes sense to do. I would recommend InterfaceError (it is based on
> > the interface we provide, rather than the underlying database). Hrm.
> > Maybe not. I can see a case where the DB module will simply attempt to
> > call the underlying DB function and generate a DB-related exception.
> > Thoughts anyone?
> 
> This is hard to do, since the types of errors may very well be
> the whole range of possible Python exceptions + the ones defined
> in the DB API, e.g. a conversion routine might raise an OverflowError
> in case of numeric overflow while raising an InterfaceError in
> case a certain internal assertion failed.

Good point. I retract my request :-)

> > * on nextset(), it might be nice to define a return value that allows
> > the user to discriminate these types of results: no more sets (None
> > right now); another set with N rows; another set with unknown number of
> > rows. The only thing really available for the last one is a negative
> > number (e.g. -1), but that seems a bit hokey. Any ideas?
> 
> I've got no experience with multiple result sets, but do that there
> was a point to define the return value in the fuzzy way it is
> written in the spec.

I think it would be useful to have a specified return value, much like
we have it for execute(). Strictly speaking, a nextset() is much like
execute, but meaning "give me the next part of my operation" ... the
client wants to know the results.

Can we mark this as an open issue? Maybe somebody will have a good idea.

Ooh. I have a proposal: maybe we can have a global named unknownCount.
This can be returned by execute() and by nextset(). In both cases, it
means "I have no idea". Client programmers should test with an "is"
rather than equivalence.  Oh, actually, if the underlying type does not
specify a cmp() function, then a "==" compares the type name... this an
"==" will work for client programmers, too, since the only other return
value is a number. (would we still have None as acceptable? ... oh, for
nextset(), yes).

New comment: should we change the globals to be: apiLevel, threadSafety,
and paramStyle? Or maybe underscore-separated? Or leave them?  hmm... I
think "Python style" says all lower-case for these, mixed signal on the
underscore. Thoughts?

Cheers,
-g

--
Greg Stein, http://www.lyra.org/