Something More Elegant

Stephen Hansen apt.shansen at gmail.com
Sat Jan 9 15:00:27 EST 2010


On Sat, Jan 9, 2010 at 7:15 AM, Victor Subervi <victorsubervi at gmail.com>wrote:

> On Sat, Jan 9, 2010 at 9:35 AM, Steve Holden <steve at holdenweb.com> wrote:
>
>> But we are now in the realm of theory as far as you are concerned, since
>> you have already stated several times that you aren't interested in
>> correcting your design until after you have got the current mess into
>> production.  So good luck with that.
>>
>
> And if you were in my shoes, I'm sure you'd do the same thing.


... Really, no, he wouldn't :) We're not all just hobbyists here who only do
work for random open source projects. A lot of us are professionals who
actually do have clients, actually do have deadlines, actually do have an
understanding for production requirements. Getting something into production
as soon as possible is certainly an important goal in commercial work. But
it is not the only goal. Proper database design is very important because if
you don't do it, you'll actually end up usually wasting *more* time and
effort then if you just bite the bullet and fix it now.

Proper database design -- in particular in your case, not having multiple
tables with various names that even need "%sThis" or "%sThat", and using
parameterized queries to access those tables, is really important. It will
save you time, it will save you effort, and it'll save you money-- because
/not/ doing it is among other things, a major security risk. Getting code
out fast and now is a noble goal in commercial projects, getting code out
which is by design prone to attacks by hackers is negligence.

Well, it *is* working now :)) And I am interested in cleaning this up. I
> should probably start with the matter of databases, since that's something I
> won't be able to easily change once clients actually start entering data.
> Please share with me any further concepts or questions to get me thinking
> how to redesign the databases.
>

Your first goal needs to be in the layout of your tables, yes. Instead of
having multiple "Packages" tables that vary, have a single Packages table,
with an additional column that determines what kind of package it is (this
information used to be in the name of the table). This may seem like
everything's jumbled together, but it works. Put an index on that column if
you need to: don't worry about if that one table might have thousands or
tens of thousands of records. Databases are -designed- to handle that, and
handle it well and faster.

Do the same for any other table which has various names-- CategoriesPackages
seems to be another one. Segment the data by adding columns and indexes when
needed, and not by breaking it out into different tables: if two tables have
approximately the same columns, they belong in one table, with a column to
simply differentiate between the two.

These steps are taking your database towards the ultimate goal of
normalization-- a laudable goal, but I won't go into that in detail. It
takes a book.

The next thing to do is to re-do your SQL queries. You should never do
string interpolation, e.g:

     SQL="SELECT x FROM y WHERE z = %s" % (arg,)
     cur.execute(SQL)

If you have done the first step, your tables always have a set names so you
never need to interpolate to do the queries-- and then at this point, under
no circumstances ever, ever-- consider this a law that you will get fined
for violation-- use string interpolation to generate your queries.

Instead, do:

    cur.execute("SELECT x FROM y WHERE z = %s", (arg,))

That looks really similar, but is lightyears away. Its very unfortunate that
some database drivers use 'format' as the paramstyle because it confuses
issues, but the two effects are very different. In one, Python is just
munging together and creating a new string. In the other, the database
driver is doing a few things. Its analyzing the query, its storing it (often
caching it, which speeds up further executions of that query), etc, etc, and
then finally its seeing that you are passing arguments into it, and it is
-safely- binding those arguments into the expression; this prevents SQL
Injection attacks. You can use interpolation and prevent injection if you
-meticulously- check -every- string that comes from the user, and -never-
trust it (even if that string was written out to a hidden <input> and
legitimate users have no way to alter, because illegitimate users will alter
it anyways). Or you can use parameterized queries and just avoid it, while
getting plenty of other benefits as well.


At work, we had a third-party package that we re-sold as part of our
offering, and glancing over its source, I noticed something. It did, in
essence to check login:

    cur.execute("SELECT user_id FROM usertable WHERE username = '%s' AND
password = '%s' % (username, password))

I blinked, and emailed them to point out the problem. I suggested they log
in as:

    Username = dummyuser
    Password = '; DROP usertable

You see, when using interpolation, the string that got sent to the database
was:

    SELECT user_id FROM usertable WHERE username = 'dummyuser' AND password
= ''; DROP usertable

And thus, from the login screen of this app-- I destroyed their environment.

Its sort of important that you not put code out into production which is
susceptible to such things. Your clients will not at all appreciate it if
and when some hacker discovers it and destroys their site, loosing them
money. But even beyond that, there are many other benefits to just doing
this right. You want to, believe me. Now, before you put anything into
production. It might take a day or two longer, but its worth it.

Finally, go finish getting rid of the bare excepts, I saw one recently :)
Just rip them all out, every one. All at once. Then fix any errors that come
along with specific excepts or pre-tests if appropriate. ;)

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100109/57328084/attachment-0001.html>


More information about the Python-list mailing list