DBI cursor behaviour with multiple statements?

Wed Oct 2 08:58:27 EDT 2002

Henrik.Weber at sys.aok.de (Henrik Weber) wrote in message news:<7e964d1d.0210010652.5b8768de at posting.google.com>...
> paul at boddie.net (Paul Boddie) wrote in message news:<23891c90.0209300554.5b37c195 at posting.google.com>...
> > 
> > Note that you probably don't want the quoting inside the SQL
> > statement. In other words, it should be...
> > 
> >   "SELECT * FROM table WHERE column = %s"
> 
> That depends on the type of the column. If it is some kind of
> character column the quotes are required at some place. Of course I
> could place the quotes in the parameters instead:
> 
> c.executemany("SELECT * FROM table WHERE column = %s", [["'parm1'"],
> ["'parm2'"], ["'parm3'"]])

This isn't how parameter binding works if you want to check for the
column values "parm1", "parm2" and "parm3" (as opposed to the
completely different column values "'parm1'", "'parm2'" and
"'parm3'"). Change the parameter style (notation) and consider the
DB-API specification for a simple query:

  cursor.execute("SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = :1",
      ["some value"])

Now, we've removed any temptation to do string substitution,
interpolation, replacement (or whatever it's called) with the
statement, since ":1" means something else.

> But actually it doesn't matter. DBI is a very thin wrapper around the
> native DB API. It doesn't analyze the content of the query, but just
> passes it on to the database, maybe replacing the placeholders by
> parameter values on the way.

For various databases (eg. Oracle) the supplied value ("some value")
actually gets bound to the placeholder/parameter in the prepared
statement. In other words, your database module calls some API
function requesting that the given statement be prepared:

  some_api_query("SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = :1")

Note that database systems which support parameter binding don't
complain about the presence of placeholders/parameters - they know
what they are and expect you to call other API functions binding the
values to those parameters:

  some_api_bind(1, "some value")

If the database system doesn't support parameter binding, you will (as
the database module author) need to encode the statement so that it
makes sense to the database system as a standalone statement:

  another_api_query(
      "SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = 'some value'")

However, this should be hidden from the user of the database module.
Intuitively, this makes sense because the user no longer needs to care
whether their values contain special characters or not. Consider this
situation:

  value = "some 'value'"
  # Later on...
  cursor.execute("SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = :1",
      [value])

The user of the database module should still experience a successful
query operation, even though they've used apostrophes in their column
value. If the database module were merely to paste that value into the
statement, however, it would fail:

  another_api_query(
      "SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = 'some 'value''")

This isn't how database modules are supposed to work, however.

The point is that the values supplied are considered as being separate
from the query text. The only way developers can remain sane is to be
able to make this distinction; otherwise, they have to manage the
complexity of the database system's SQL parser as well as the
complexity of their own code, and this would make for a maintenance
nightmare (and various potential security holes).

> However the result of that operation has to be a SQL statment that the native
> RDBMS will understand, so it's not important what the query string looks like
> as long as the underlying database will accept it.

Indeed, and many database systems accept placeholders/parameters
because they do understand them.

> > Personally, I'd recommend supporting different parameter notations,
> > since this issue always confuses people.
> 
> The paramstyle variable is supposed to be a string, so I can't put
> several different values in there.

You could support paramstyle reconfiguration at connection time. This
would require you to substitute the parameters used for those employed
by the database system, however.

> I could use the native parameter notation of the database, which is
> qmark in this case. This is not one of the preferred notations
> according to the DBI description, so I thought I might use pyformat
> instead. Somewhere I have read that preparing a query doesn't make a
> difference in performance with Access so I might just as well pass a
> new query string to the database for every set of parameters without
> losing anything. In the example it would have been possible to pass a
> named placeholder (like "%(parm)s") and a sequence of dictionaries
> instead without changing the code of the method.

Yes, but then you have to do the quoting of the supplied values on
behalf of the caller - it's not up to them to do this.

> Anyway I haven't yet decided on a quoting style and the style I'm
> currently using is different from the one in the example. It was just
> that the example was shorter this way and it didn't make a difference
> for the question I was trying to have answered.

No, but the use of apostrophes in the values suggested that you
equated placeholders/parameters with string
substitution/interpolation/replacement, when they're clearly not the
same thing. If you weren't under that illusion then I apologise now.

> >   "SELECT * FROM table WHERE column IN %s"

[...]

> > This appears highly counter-intuitive, of course, since the
> > Python-style %s parameter marker actually represents a list in this
> > case - another reason for choosing a better parameter notation.
> 
> No, it doesn't represent a list.

Consider the alternative notation:

  SELECT * FROM table WHERE column IN :1

I would definitely expect a list/tuple/sequence to be supplied as the
value that binds to the parameter. Indeed, the psycopg developers
appear to agree.

[...]

> As I said above a DBI wrapper doesn't parse the query to find out how
> to interpret the parameters (at least it would surprise me very much
> if it did). It is still the users responsibility to assure that the
> queries sent to the database are syntactically correct. Passing lists
> as query criteria like in your example above is not going to work.

You're right in the sense that some database modules don't need to
look at the query string at all. If the underlying database system
supports placeholder notation, you can risk sending the query string
to the database system...

  some_api_query("SELECT * FROM SOME_TABLE WHERE SOME_COLUMN = :1")

...and then traverse the list of values, binding each one as you go:

  for i in range(0, len(values)):
      if some_api_bind(i, values[i]) == FAILED:
          raise SomeException, "Could not bind all values."

If the database system doesn't like it, you'll get told about it, and
you would pass this failure condition back in some meaningful way.

On the issue of syntax, the only thing the user needs to guarantee is
the syntax of the query string, *not* whether the query string with
"pasted in" values is also syntactically correct (especially since a
simplistic string replacement is *not* what goes on). Moreover, with
some database modules, the user doesn't really need to check the types
of the values very carefully either, but that's another story.

Paul