How to use SQLite (sqlite3) more efficiently

Wed Jun 4 17:23:21 EDT 2014

On Thu, Jun 5, 2014 at 6:27 AM,
<ps16thypresenceisfullnessofjoy at gmail.com> wrote:
> I'm completely new to SQL, and recently started using SQLite in one of my Python programs. I've gotten what I wanted to work, but I'm not sure if I'm doing it in the best/most efficient way. I have attached some sample code and would appreciate any (polite) comments about how the SQL (or Python) in it could be improved. The code is written in Python 2, but I think it should work in Python 3 if the 4 print statements are changed to function calls. Am I correct that the function 'set_description2' should work the same way as 'set_description'?
>

Happy to help out! But before I look into the code itself, two small
points. Firstly, you're using Google Groups, which miswraps text and
double-spaces all quoted text. This is considered to be extremely
annoying on this list/newsgroup, and while it's not technically your
fault, it will make people less inclined to read and respond to your
posts. I advise signing up for the mailing list instead:

https://mail.python.org/mailman/listinfo/python-list

Alternatively, you can read the comp.lang.python newsgroup in any good
newsreader, or you can use Gmane.

The second point that you may want to consider: If you think the code
is that close to Python 3 compatible, and if your Python 2 is either
version 2.6 or 2.7, just put this line at the top of your script:

from __future__ import print_function

Then you'll be using print() as a function, and your code may well
work unchanged on both versions. But do still tell us what version
you're running this under; "Python 2" is a start, but it'd be useful
to know which minor version, and which operating system. I don't think
it makes any difference here, but it's easy enough to just say "Tested
on Python 2.6 on Debian Linux" or "Tested on Python 2.7.7 on Windows
7" or somesuch.

On to the code! *plays Lord Phillip theme from 'The Beauty Stone'*

> def get_description(conn, name):
>     cur = conn.cursor()
>     cur.execute("SELECT description FROM ProgrammingLanguages WHERE Name=?",
>         (name,))
>     row = cur.fetchone()
>     if row:
>         return row[0]
>     return None

Since cur.fetchone() returns None if there's no row, you can simplify this down:
    return row and row[0]
Not everyone likes that sort of style, so take your pick.

> def set_description(conn, name, description):
>     cur = conn.cursor()
>     cur.execute("SELECT 1 FROM ProgrammingLanguages WHERE Name=?", (name,))
>     row = cur.fetchone()
>     if description:
>         with conn:
>             if not row:
>                 conn.execute("INSERT INTO ProgrammingLanguages VALUES(?,?)",
>                     (name, description))
>             else:
>                 conn.execute("UPDATE ProgrammingLanguages SET Description=? " \
>                     "WHERE Name=?", (description, name))
>     elif row:
>         with conn:
>             conn.execute("DELETE FROM ProgrammingLanguages WHERE Name=?",
>                 (name,))
>     conn.commit()

There's a general principle in Python APIs that a "mode switch"
parameter isn't a good thing. Even more strongly, I would be very
surprised if attempting to set a blank description deleted the row
instead of setting the description to blank. My recommendation: Split
this into two functions, set_description and delete_language. In
delete_language, just unconditionally delete, don't bother checking
for the row's presence first.

The insert-or-update concept is a bit of a tricky one. With a simple
database, you probably don't need to worry about the race condition of
seeing there's no row, then going to insert, and finding that you
can't; but still consider an EAFP model instead - attempt the update,
and if nothing got updated, do the insert (or, conversely, attempt the
insert, and if you get back an error, update instead). That still has
a chance of a race (rows can still be inserted or deleted in the
middle), but if you do the more likely case first (do you expect to
mostly be updating or inserting?), you'll catch most of your work with
a single query, which is that bit safer (not to mention faster).

However, trying to use an SQL table as if it were a Python dict is
fundamentally hard. No matter how you work it, it won't be perfect.

> def set_description2(conn, name, description):
>     with conn:
>         if description:
>             conn.execute("INSERT OR REPLACE INTO ProgrammingLanguages " \
>                 "VALUES(?,?)", (name, description))
>         else:
>             conn.execute("DELETE FROM ProgrammingLanguages WHERE Name=?",
>                 (name,))
>     conn.commit()

In theory, yes, this will probably achieve the same as your
set_description. But since, as I said above, it's fundamentally hard
to do the "INSERT OR REPLACE" operation, there's no guarantee it'll be
any better than doing it manually; and this is something that not all
database backends support, so if you ever change to a more high-end
database (eg PostgreSQL), you'll have to recode this anyway.

This function, even more than the other, highlights the value of
splitting it into two. There's almost no code common to the two
branches - just the 'with' block (and I'm not sure what that
accomplishes here) and the commit call. May as well split them out and
make it clear when you're deleting something.

> conn.execute("CREATE TABLE IF NOT EXISTS ProgrammingLanguages(name TEXT " \
>             "PRIMARY KEY, description TEXT)")

I'd recommend using a triple-quoted string, unless you have a good
reason not to. Also, breaking it in the middle of one declaration is
usually a bad idea - makes it harder to read. Here's how I'd lay that
out:

conn.execute("""CREATE TABLE IF NOT EXISTS ProgrammingLanguages (
    name TEXT PRIMARY KEY, description TEXT)""")

If you have more columns to add, you could break it at any comma;
sometimes, you can put logically-related columns onto one line, like
this:

conn.execute("""create table if not exists ProgrammingLanguages (
    Name text primary key, Description text not null,
    Year_Introduced smallint not null, Year_Withdrawn smallint not
null, -- you should be able to put SQL comments here, too
    Uses_Braces boolean not null, Supports_Unicode boolean not null,
Provides_Bignums boolean not null,
)""")

(I tend to put SQL keywords in lowercase, and I generally put some
uppercase letters in field names like this. Neither change has any
significant effect - or shouldn't, with a properly SQL compliant
database - so do whatever you find more readable.)

Note, by the way, that I've peppered 'not null' everywhere. Apart from
the primary key (which is implicitly 'not null'), all fields are
nullable unless you say otherwise. Generally you don't need that, and
it's one less possibility to have to worry about. (You may want to use
'with default' so you can insert into the table without having to
specify every field.)

> set_description(conn, "Assembly",
>     "Making Easy Things Very Hard & Hard Things Impossible")

Hey, that's not fair! Assembly language makes some hard things really
easy, like segfaulting your process. Credit where it's due! :)

> set_description(conn, "Assembly", None)
> print "Assembly: %s" % get_description(conn, "Assembly")  # Should be None

This is where I would be using an explicit delete operation. The only
advantage of the merged API is that you could get "a string or None"
from somewhere and pass it straight to set_description(), and have it
figure out whether to update or delete; but in the very rare situation
where you'd do that, you can have the explicit "if desc is None" check
outside the function just as easily as inside. (And I would advise
using "if desc is None" rather than just "if desc", because a blank
description shouldn't normally be a problem - or if it is, that should
be commented somewhere.)

By and large, you have some reasonably decent code there. The points
I've made are all fairly minor, and several of them are matters of
stylistic taste more than anything else. SQL is sufficiently simple
(for simple jobs, at least) that it's hard to do anything really
horrendously wrong; you clearly know about parameterized queries,
which would be the one biggest unobviosity to learn, and other than
that... if it works, it's probably right. Once you start getting into
more complicated queries (multi-table joins, true transactional
processing, stuff like that), there are more ways to go wrong, but
even then, it's pretty easy to figure stuff out.

One final point, before I hit send. When you decide that SQLite isn't
enough for your purposes (maybe when you need multiple applications to
be able to hammer the database - SQLite isn't designed for high
throughput), skip MySQL and go straight to PostgreSQL. Even though
MySQL might look tempting at first (it has "REPLACE INTO" which does
your INSERT/UPDATE job - PostgreSQL doesn't, the SQL standard
doesn't), it's full of annoying traps, and its defaults leave a lot to
be desired. Go with PostgreSQL; the defaults may be unideal as regards
performance, but at least they're conservative as regards reliability,
and that's what you need.

All the best!

ChrisA