[DB-SIG] pyformat Parameter Style

Magnus Lyckå magnus at thinkware.se
Thu May 15 23:34:42 EDT 2003


At 16:03 2003-05-14 -0700, Chris Cogdon wrote:
>I (Magnus) wrote about %s:
>>To anyone used to Python this looks like we are passing in str(my_float)
>>while we are in fact passing in repr(my_float).
>
>Actually, that's not true. Python's quoting rules are NOT the same as most 
>DMBS's quoting rules. For example the following:
>
>cursor ( "select blah from data where thingy=" + repr(thingy_value) )
>
>would only work in some circumstances. It would NOT work if, say, 
>thingy_value contains a '  (python would then use " as a string quoter, 
>which means something very different to the DBMS than single quotes), or 
>if thingy_value contains foreign characters (python 2.2 uses hex escapes, 
>while most DMBSes require octal escapes).

And how often is there a ' in a float? :)

This is my point. All the magic is in strings, and possibly
in types like date, but there is *no* magic in the numeric
types, so allowing numeric values to be passed in with %f or
%i etc seems ok. Why disallow this? Practically, I'm suggesting
(for pysqlite):

def _quote(value):
     """_quote(value) -> string
     This function transforms the Python value into a string suitable to 
send to
     the SQLite database in a SQL statement.  This function is automatically
     applied to all parameters sent with an execute() call.  Because of this a
     SQL statement string in an execute() call should only use '%s' [or
     '%(name)s'] for variable substitution without any quoting."""
     if value is None:
         return 'NULL'
     elif isinstance(value, StringType):
         return "'%s'" % value.replace("'", "''")
-   elif isinstance(value, LongType):
-       return str(value)
+   elif (isinstance(value, LongType) or isinstance(value, IntType)
+         or isinstance(value, FloatType)):
+       return value
     elif hasattr(value, '_quote'):
         return value._quote()
     elif have_datetime and type(value) in \
             (DateTime.DateTimeType, DateTime.DateTimeDeltaType):
         return "'%s'" % value
     else:
         return repr(value)

(One might consider using "type(value) in [long, int, float]"
instead, since subclasses might do odd things on for instance
__str__, but that's also a problem with the current code.)

What would be the problem with this? %s will still work, but with
12 decimals instead of 17 for floats. I doubt that is a problem,
and %e or %.16e or %r will still work. If the application programmer
is insecure about the result, he can simply try it out with a
normal formatted print.

The problems exist with *strings*, not with numeric values, and *that*
can't be solved by banning the use of %f etc for *numeric* values.

I think it would be very useful to get the parsed SQL statements
printed in some kind of debug mode, since strings can look different
than one might expect. That is a feature I'd like to see.

>>So if the backend is capable of storing exact decimal values and
>>we try to store 0.1, we will accidentally store 0.10000000000000001.
>>It's likely that the backend won't be configured to so many digits,
>>and will round it to 0.1 anyway, but I don't really like this.
>
>That's why it's the responsibility of the interface to 'correctly 
>translate' any values passed to it.

How on earth would the interface be able to know whether we
actually want to store 0.1 or 0.10000000000000001 when a
Python float can't distinguish them?

For exact decimal numbers, we have a mismatch between Python
and SQL, since Python is lacking such a datatype, and after
some years of Python programming I feel that explicit is
better than implicit. I.e. I prefer that I control this
conversion rather than to let the interface do that. (The
problem is still strings. I can accept that the interface
escapes and quote strings, because the escape and quote stuff
is not to be stored in the db, it's just a way to make the
backend accept exactly what I try to store, but it's a bit
annoying that things like %.5s will fail.)

%f or %e formatting would give us a tool to handle the numeric
type mismatch. Obviously rounding 0.1 to one decimal won't work.
We will still store 0.10000000000000001 if the value is a Python
float passed via repr(). If we convert it to the string '0.1'
before passing it to the SQL statement, the inteface will adorn
it with quotes, but maybe that will still work? (I hope so,
otherwise the ? and :1 forms seem problematic.)

Anyway, it seems full %-formatting could be useful, even if
it's a rather small issue in my opinion.

>I agree that there may need to be an exception, or a clarification made, 
>in the case of floats. However, this is also true when doing straight-out 
>python coding. It is 'risky' to code something like the following:
>
>f = read_float_value_from_somewhere()
>if f = 0.2:
>     do_something_important ()
>
>This is risky because 'f' cannot ever possibly be exactly 0.2

That is a syntax error, but f == 0.2 will work since 0.2 can't
be exactly 0.2 either! :) Try it! (And try "0.1 + 0.1 == 0.2".)

It more complex than this, and I won't bore you with my old
"war stories", but my point is that the application programmer
might get into situations where he needs to be able to have as
much control as possible, and I've not heard a good reason so
far to remove that control as some db interfaces completely
needlessly does today.

If you have a decimal(10,2) column X with 1.15 stored, and
do "...WHERE X=%s", (1.15)) you will get
...WHERE X=1.1499999999999999 today. Are you sure that will
match? If you did "...WHERE X=%.2f", (1.15)) and my patch
was applied, you'd get ...WHERE X=1.15. That would certainly
match. I imagine that if it is decimal(10,2) as I said, the
float passed in will be coersed into a decimal(10,2) and it
will work anyway, but I think I can show other cases where
this *is* a problem if I have to. Probably rare, but why
remove the simple tool to fix it when it happens?

>It's 'obvious' only if the programmer has 'assumed' that his value will be 
>treated according to the python '%' operator conventions, which is not true.

It would be much more pythonic is it *was* true. :)
It's a really ugly hack as long as it's not true.

One of the really great things with python is that it
almost always works just as expected. I can learn something
in one context and apply it in another. Other languages I
use are full of "you can't do this with that" and "in that
situation you will have to do it like this instead". Perl
and Visual Basic for instance is full of such rubbish.

>Most interfaces run through a quoting function before being applied to the 
>string containing the '%s's

Only for strings, and that is a temporary decoration needed
to get exactly the value in the original string passed to the
backend! Changing the string

Can't do

into

'Can''t do'

is just a temporary measure to make certain that we will
actually use

Can't do

and neither more, nor less, in the actual SQL operation. I don't
think that's more odd than the fact that the string "can" will
have the representation 'can', and "can't" will have the
representation "can't" in Python. That *is* an interface issue,
and a feature for the coder.

Making %f fail when it's passed 5 is not a feature. It's a bug,
or at least a deviation from the spec.

>Well, keep in mind that 'the spec' doesn't mention ANYTHING other than 
>'%s' and '%(key)s'. And, looking at existing interfaces, many have not 
>coded for the possibility that anything other than those two will be used.

Yes it does. Read again. It only uses %s in the examples,
but read the text. It seems M-A and I agree that it could
be made clearer, but it *does* state "ANSI C printf format
codes" and "Python extended format codes", and the tiny
example is just that. Did you expect "name" being passed
in as a flot or what? ;)

>And, again, '%r's actual MEANING is very poorly defined. repr() does not 
>return a value that will always be accepted by the DMBS. It requires extra 
>quoting, and then should be be quoting and THEN repr, or the other way around ?

I don't qute understand the precious sentences. But the
interfaces in question use repr() as default. See _quote
above. That's not *my* idea.

>Most likely because the interface writer is just using the '%' operator to 
>do the insertion of the parameters into the string, without 'stopping' the 
>application writer from doing things like '%2.2s', and so on. There are 
>many other instances where the behaviour outside of the specification is 
>not well defined, but does not return an error.

My point was mainly to point out that it's odd to block %f for
no appearent good reason and allow people to write things like
%.2s and thus trunctate something which is probably different
than the applicaiton programmer intended.

The tiny trap I see is that %f rounds to six decimals, which
people might forget. By all means, suggest the use of %s, or
even %e or %r for numerics, and above all, suggest that people
try their code with print fmt % params, or even better make some
debug function so that they can see what the SQL will look like.

>Because, trapping for this would mean writing a lot more code. Just 
>slapping in '%' is both easy to do, and fast, because it's a 'built-in'.

So it seems we agree. Things look like they do, not because
it was best for the application programmer, but because it
was easiest for the driver programmer. I agree that there is
a value in making life easier for interface programmers.

I also think that several people have misunderstood the
specification, even if it's explicit about allowing the
ANSI C format codes if you read it properly.

But as I've shown above it is (I think) very simple to make
it complient, and %s will continue to work like before unless
you need ten decimals.

>>Perhaps these modules should throw an exception as soon as % is
>>followed by anything else than "s", "%" or "(word)s" ? That's
>>where the problems really are, since the drivers escape and
>>quote the strings, which makes them look different then they
>>do in the scope of the client application.
>
>I agree that that would be the 'best' solution, but it would slow down the 
>code somewhat.

If that is a problem, use "if __debug__:" and let people
run with -O when they can't spare cycles for this check.


--
Magnus Lycka (It's really Lyckå), magnus at thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program 




More information about the DB-SIG mailing list