[DB-SIG] Some obscurity with paramstyle

Mon Jul 18 04:29:56 CEST 2011

On Jul 17, 2011, at 9:10 PM, Vernon Cole wrote:

> 
> 
> On Sun, Jul 17, 2011 at 8:54 AM, Michael Bayer <mike_mp at zzzcomputing.com> wrote:
> 
> On Jul 17, 2011, at 1:52 AM, Vernon Cole wrote:
> 
> > 2) some major applications (e.g. django) assume %s format.
> >
> > But I am +1 for the idea anyway.
> 
> But not all DBAPIs support %s format, do you mean that Django assumes %s format for those DBAPIs that are known to do so ?   Shouldn't they be at least using .paramstyle to determine that ?
> 
> Well, that's how the 2.0 spec assumes things should work: the DBAPI module tells you which paramstyle it uses, then you write your program five different ways to cover the five possibilities.  But django does not do that (does anyone??)

If I understand correctly,  SQLAlchemy has done it this way since its first release six years ago.   It's quite simple to have the param behavior vary independently of a specific DBAPI implementation. 

> -- it blindly assumes that all DBAPI's use %s.  That's why someone made a fork of adodbapi in order to support Microsoft SQL server in django. 
> ADO uses qmark, so the fork version had to convert %s to qmark before executing the SQL statements.   I pulled the format conversion code (and some other improvements) back into the release version of adodbapi, fixed it so that it does not break % signs in literals, and added a feature so that a programmer can request either qmark, format, or named paramstyle. 

Pysqlite doesn't support %s, how'd they work around that one ?  

> 
> Is there a path to changes being made in the DBAPI?   i.e. would there be a DBAPI 3 ?  
> 
> That possibility has been discussed before, and is particularly timely given that it is impossible to write a PEP 249 compliant module in Python 3.  [For example, the PEP states that an Error exception "must be a subclass of the Python StandardError" -- which Python 3 does not support.]  

yeah StandardError, had a funny thread about that one some weeks ago...

> 
> Marc-André Lemburg (the author of PEP 249) came out against an update -- mostly due to performance reasons.  This is understandable, because his company produces a very efficient db api module.  The features which have been suggested for a possible DBAPI 3.0 would add some significant overhead in some situations. The worry is real. Compare the mxDateTime module (also from Mark-Andre's company) with the (newer) standard datetime module, and you will see that datetime is a slow dog. (I use it anyway.)

The spec should provide that every DBAPI support the standard datetime module by default (though this is tricky for sqlite...);  hooks to use alternates like mxDateTime should be present as well as I would assume mxDateTime has a very similar interface to the datetime builtin.    

If datetime is slow, that's a Python bug - an efficient C implementation should be provided for such a critical piece.   Similarly, the cDecimal library should be part of Python too as Decimal is extremely slow (I would hope that DBAPI 3.0 references support for Decimal explicitly - many DBAPIs like pymssql are broken due to the assumption of float for precision numerics).   I would hope DBAPI3 can say something about supporting Decimal for precision numerics specifically without using lossful floats as a middle layer (we have a lot of workarounds for different DBAPIs to get this part to work, on some it's impossible).

> 
> On the other hand, GVR is in favor of an update.

Given the very bad state of the Unicode situation in DBAPI 3, as well as for precision numerics,  i think it's pretty necessary.    I would opt that all strings from a version 3 DBAPI can be  returned as u'' literals in Python 2 (including down to the names present in cursor.description - pyodbc + MSSQL for example decides to fix these as latin-1 despite other encoding settings).    It's been my observation that an encoding parameter is pretty much present in some form for every database client API, this should be an explicit part of the DBAPI, and the encoding present would be used for all bytestring/unicode coercions.

If performance is a concern here, a documented system of enabling "unicode returns" can be provided.      But in Python 3 these strings are all unicodes anyway, so this behavior will be needed in any case.   

Python unicodes should also always be accepted for all arguments, including bind parameter values, bind parameter names, and the SQL statement itself.  SQLAlchemy has to encode these for some backends, for others it has to leave it alone else things break, and for pyodbc it's dependent on if you're using FreeTDS or not [!].     While it might seem a little strange to suggest that unicode is explicitly allowed for all strings/identifiers/names, as well as cursor.description names, we have users in europe and asia that use non-ascii identifiers all the time (and they are accepted by most databases), and we've bent over backwards for years to support them all.   

Pysqlite is the only DBAPI that seems to have "all unicode everywhere" behavior, with unicode support to extremely varying degrees in all the others (with Psycopg2 probably doing the best job at coming close - even though you need to use non-standard extension calls to enable it fully).  

On the unicode issue, cx_oracle had a totally complicated progression in version 5.0 when they introduced this poorly-considered UNICODE mode, derived from their Python 3 implementation but made available as a static build option for Python 2, where even the connect() call wouldn't accept any incoming string that was *not* unicode !   Then some user insisted that he absolutely *needed* this ridiculous mode, more bending over backwards for SQLAlchemy to support it, then cx_oracle 5.1, realizing they should just be using NLS_LANG (i.e. the "encoding" parameter provided by OCI), they *removed* the whole thing entirely.   A long effort that caused wasted time for many.   Explicit, non-ambiguous guidance on unicode handling is sorely needed.

> When MAL suggested that some ease-of-use improvements (such as named columns) would make implementation of a module more difficult, The BDFL replied that we should make things easier for the user, not the module author. [ I am relying on my memory -- my copy of the exchange got deleted and I am too lazy to look it up in the archive. ;) ]

Ease of use things like named columns, connection.execute(), context manager, these are all fairly trivial adds for all DBAPIs.    They are of course all fine.   But IMHO this is not the critical piece.   It's about behaviors that everyone needs to deal with, like can I send u'select * from table' to execute() or must it be 'select * from table', can the "sequence" sent to execute() be a tuple *or* a list (yes this varies, *incompatibly*, between DBAPIs, some accepting only tuple, others accepting only list), that each DBAPI is currently doing in an arbitrary and different fashion with no guidance establishing non-compliant behaviors as bugs.    If the spec were just more decisive about things, I could just go to all these DBAPI authors that all have their own opinions about things and say "no, this is it".   Right now, DBAPIs are out there with all kinds of behaviors, I can walk you through a large number of them that we've had to normalize.   None of them are bugs.  Once the spec comes out and takes a stand, they all become bugs, and until fixed the DBAPi is not spec-compliant.

> 
> So Yes, there are a TON of new language constructs which could be addressed in an updated API version.  The use of a cursor as an iterator is supported by most DB API modules, but not mentioned in PEP 249.  A cursor and/or a connection should probably be context managers, so that they will work in a "with" statement.  There should be a better definition of how BINARY fields work with byte() data. 
> --
> Vernon
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/db-sig/attachments/20110717/339fd15e/attachment-0001.html>