Why Is Escaping Data Considered So Magical?

Nobody nobody at nowhere.com
Fri Jun 25 02:47:47 EDT 2010


On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:

> Just been reading this article
> ...
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.
> 
> Fair enough. But it’s disconcerting to see some of the advice being
> offered in the reader comments, like “force everyone to use stored
> procedures”, or “force everyone to use prepared/parametrized
> statements”, “never construct ad-hoc SQL queries” and the like.
> 
> I construct ad-hoc queries all the time. It really isn’t that hard to
> do safely.

Wrong.

Even if you get the quoting absolutely correct (which is a very big "if"),
you have to remember to perform it every time, without exception. And you
need to perform it exactly once. As the program gets more complex,
ensuring that it's done in the correct place, and only there, gets harder.

More generally, as a program gets more complex, "this will work so long as
we do X every time without fail" approaches "this won't work".

> All you have to do is read the documentation—for example,
> <http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then
> write a routine that takes arbitrary data and turns it into a valid
> string literal, like this
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.

That's okay. Provided the documentation is accurate. And provided that you
update the escaping algorithm whenever the SQL dialect gets extended, or
you switch to a different back-end, or modify the program. IOW, it's not
even remotely okay.

"Unparsing" data so that you get the correct answer out of a subsequent
parsing step is objectively and obviously the wrong approach. The
correct approach is to skip both the unparsing and parsing steps
entirely.

Formal grammars are a useful way to represent graph-like data structures
in a human-readable and human-editable form. But for creation,
modification and use by a computer, it is invariably preferable to operate
upon the graph directly. Textual formats inherit all of the "issues" which
apply to the underlying data structure, then add a few of their own for
good measure.

> I've done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash.

And, of course, you're convinced that you got it right every time. That
attitude alone should set alarm bells ringing for anyone who's worked in
this industry for more than five minutes.




More information about the Python-list mailing list