[DB-SIG] pyformat Parameter Style

Chris Cogdon chris at cogdon.org
Wed May 14 17:03:33 EDT 2003


On Wednesday, May 14, 2003, at 15:24 US/Pacific, Magnus Lyckå wrote:

> This is a detail, but I think it's misleading to support
> %s but not %f etc.

It would be misleading, yes, unless it was specifically documented.

> To anyone used to Python this looks like we are passing in 
> str(my_float)
> while we are in fact passing in repr(my_float).

Actually, that's not true. Python's quoting rules are NOT the same as 
most DMBS's quoting rules. For example the following:

cursor ( "select blah from data where thingy=" + repr(thingy_value) )

would only work in some circumstances. It would NOT work if, say, 
thingy_value contains a '  (python would then use " as a string quoter, 
which means something very different to the DBMS than single quotes), 
or if thingy_value contains foreign characters (python 2.2 uses hex 
escapes, while most DMBSes require octal escapes).

> So if the backend is capable of storing exact decimal values and
> we try to store 0.1, we will accidentally store 0.10000000000000001.
> It's likely that the backend won't be configured to so many digits,
> and will round it to 0.1 anyway, but I don't really like this.

That's why it's the responsibility of the interface to 'correctly 
translate' any values passed to it.

> After all, it *must* be the application programmer who decides what 
> value
> to pass to the backend. If we have paramstyle = format, why not allow 
> him
> to express this as "... x=%.2f", my_float) instead of
> "... x=%s", round(my_float, 2)) if that is what he wants?

yes, it's the application programmers responsibility to determine what 
values are passed, but not HOW they're passed, since that's up to the 
backend syntax specification. The API writer is far more qualified to 
know what the requirements are than the application programmer.

> I disagree. If a database application programmer is not aware of how
> floating point numbers work in Python and in his backend on the given
> platform, he is likely to cause a mess. I've cleaned up such things,
> both in SQL code and in C++ code in various projects. There is no way
> the driver can save anyone from doing stupid things like comparing
> floats on equality without any fuzzfactor. The programmer *has* to
> worry about float representations. This doesn't imply that he needs
> full format capability in the SQL strings, but he can't let the
> computer handle that in any programming language that uses floating
> point arithmetic in the way C and Python does. (Not that it's fully
> problem free with exact decimal representations either.)

I agree that there may need to be an exception, or a clarification 
made, in the case of floats. However, this is also true when doing 
straight-out python coding. It is 'risky' to code something like the 
following:

f = read_float_value_from_somewhere()
if f = 0.2:
     do_something_important ()

This is risky because 'f' cannot ever possibly be exactly 0.2

>> cursor.execute ( "insert into data ( %2.2f )", my_float )
>
> To anyone used to python it should be clear what this implies.
> Do you mean that DB application programmers by accident enter 2.2
> without knowing what it means? I find that unlikely.

It's 'obvious' only if the programmer has 'assumed' that his value will 
be treated according to the python '%' operator conventions, which is 
not true. Most interfaces run through a quoting function before being 
applied to the string containing the '%s's

> But today, you must use %r to be able to get exactly the same
> behaviour across database drivers, since the drivers that are
> complient with the spec will use %s if you tell them to do
> that, and others like pysqlite will use repr() whatever you
> type after the %-sign.

Well, keep in mind that 'the spec' doesn't mention ANYTHING other than 
'%s' and '%(key)s'. And, looking at existing interfaces, many have not 
coded for the possibility that anything other than those two will be 
used.

And, again, '%r's actual MEANING is very poorly defined. repr() does 
not return a value that will always be accepted by the DMBS. It 
requires extra quoting, and then should be be quoting and THEN repr, or 
the other way around ?

> By the way, I think all the drivers in question that stops our
> ignorant application programmer from writing something potentially
> harmful as ("insert into data ( %2.2f )", 1000./3.) will gladly
> allow him to write something decidedly harmful such as
> ("insert into data ( %2.2s )", 1000./3.) and silently store 33
> in the database instead of 333.33 which had been the case with
> %2.2f. At least that is what pysqlite does. :(

Most likely because the interface writer is just using the '%' operator 
to do the insertion of the parameters into the string, without 
'stopping' the application writer from doing things like '%2.2s', and 
so on. There are many other instances where the behaviour outside of 
the specification is not well defined, but does not return an error.

> "Only use %s, don't put in any other things like %.2s, but if
> this is the intention, why write the code so that the user can
> make such a mess.

Because, trapping for this would mean writing a lot more code. Just 
slapping in '%' is both easy to do, and fast, because it's a 'built-in'.

> Perhaps these modules should throw an exception as soon as % is
> followed by anything else than "s", "%" or "(word)s" ? That's
> where the problems really are, since the drivers escape and
> quote the strings, which makes them look different then they
> do in the scope of the client application.

I agree that that would be the 'best' solution, but it would slow down 
the code somewhat.

> Also, the implementations of %s in for instance pysqlite chokes
> on things like "select * from x where a=%s and b like 'hello%'"
> You have to use 'hello%%', or you will get a
> "TypeError: not enough arguments for format string"

Yep. That should have been part of the specification.

> In other words, it seems that a %s-limited format or %(xxx)s limited
> pyformat isn't there to help the application programmer, since it has
> dangers that ?, :1 or :xxx doesn't have, but is instead there just to
> make life somewhat simpler for the driver programmer.

No argument here :)

Perhaps the specification should be 'clarified' by writing the 
following:

"""
To use positional parameter parsing, use '%s' and pass your parameters 
as a tuple as the second parameter to the 'execute' function.

To use keyword parameter parsing, use '%(key)s' and pass your 
parameters as a dictionary as the second parameter to the 'execute' 
function.

If you need a '%' in your SQL syntax, use '%%'.

This convention operates much like python '%' operator, and may in fact 
be implemented as such. However, the behaviour of using anything other 
than '%s' and '%(key)s' is not defined, and should be avoided. The 
interface is not required to trap for these conditions in this version 
of the specification
"""



-- 
    ("`-/")_.-'"``-._        Chris Cogdon <chris at cogdon.org>
     . . `; -._    )-;-,_`)
    (v_,)'  _  )`-.\  ``-'
   _.- _..-_/ / ((.'
((,.-'   ((,/   fL




More information about the DB-SIG mailing list