using sqlite3 - execute vs. executemany; committing ...

Sun May 4 05:14:59 EDT 2008

> - Are there any peculiarities with using curs.executemany(...) vs. multiple
> curs.execute(...) ? I read a notice, sqlite3 does internally some caching,
> hence both should be similarly fast, but in my case executemany(...) is
> quite a bit faster

How many times are you calling execute vs a single executemany? The
python call overhead will add up for thousands of calls.

The relevant source code is here if you're interested:

http://svn.python.org/projects/python/trunk/Modules/_sqlite/cursor.c

>  Further, I am not quite sure about the standard usage of the cursor object
> and also the proper commiting the transactions and closing the connection.

Standard usage is here:

http://docs.python.org/lib/module-sqlite3.html

If the database supports transactions then cursors automatically use
transactions. Your changes only get committed when you call .commit().
Otherwise your changes are lost.

In the specific case of sqllite, some statements (like CREATE TABLE,
ALTER TABLE, etc) also cause a commit. This is probably where your
confusion comes from. Since this isn't part of the python DB API spec
(http://www.python.org/dev/peps/pep-0249/) I wouldn't rely on it.
Otherwise you will have problems with other databases.

Also, in your specific case you're using an 'in memory' sqllite db. So
there are less concerns with losing data between db sessions, etc. But
with most databases (on disk, running across the network on a server)
this becomes important.

>  Should one create a cursor of a connection and call the execute ... methods
> of the cursor -
>  or is it better to call the shortcut execute etc. methods of the Connection
> object directly (as suggested in the docs:
> http://docs.python.org/lib/node351.html (or are there specific use cases for
> both approaches)?

I suggest that you use the standard cursor methods instead, so you can
run your code against non-sqllite databases.  The performance etc
should be the same as using the direct method. Like the page says,
it's main benefit is consiseness.

>
> When the transactions should be commited? (creating, altering a table, or
> also selecting the results ?)
>  There seem to be some implicit handling of the transactions (
> http://docs.python.org/lib/sqlite3-Controlling-Transactions.html#sqlite3-Controlling-Transactions
> ); hence I am not sure about the standard usage of these methods; the same
> is true of connection.close() - or are these calls eventually unnecessary?

As a general rule, always use .commit() and .close(). Otherwise:

- No .commit() - you will lose db changes since the last commit or
"non-DML, non-query statement" (in the case of sqllite)
- No .close() - Your database connection will only close when your db
objects are garbage collected.

> conn_tags_DB = sqlite3.connect(':memory:')
>  curs = self.conn_tags_DB.cursor()
>  curs.execute('CREATE TABLE IF NOT EXISTS "%s" ("%s", UNIQUE("%s"))' %
> (self.text_name, index_col_name, index_col_name))
>  curs.execute(u'INSERT OR REPLACE INTO "%s"("%s") VALUES (?)' %
> (self.text_name, index_col_name), (0,))
>  for new_col in act_db_columns[1:]: # adds the needed columns (except of the
> first one: index_col_name)
>      curs.execute('ALTER TABLE "%s" ADD "%s" TEXT' % (self.text_name,
> new_col))
>  curs.executemany('INSERT OR REPLACE INTO "%s" VALUES (%s)' %
> (self.text_name, question_marks), tags_seq)
>  self.conn_tags_DB.commit()
>
> Are there maybe any comments or hints on a more elegant/efficient solution?
>

I think that dynamically creating schema (tables, based on text file
structure is a bad idea. A few reasons:

- This forces you to dynamically generate all your queries dynamically
- Not all strings are valid table/column names
- This forces the app to run as database administrator (maybe not
important for sqllite, but definitely an issue if you change to
another dbm).
- Potentially huge stability/security problems - text files can
potentially break system tables, overwrite users, etc, etc.

You're violating several rules on db design/usage.

I strongly recommend that you use a better database logic. ie, create
tables and records in advance (setup script, as db admin user if
applicable), then only use delete/insert/update/select statements (as
restricted user, if applicable).

If this is too much trouble, then I suggest storing your database in
regular Python structures instead, and use pickle/yaml/etc to write to
disk. Your current version uses a 'in memory' database, so the end
result is the same. You'll get a large performance boost also.

> Now, what's the usual way to access the database? Is it
> possible/wise/standard ... to leave the connection open for the subsequent
> queries during the whole run of the app; could even the cursor eventually be
> present as a class method, or should it rather be created repeatedly with
> each call?  (After populating, the db shouldn't be modified, but only read.)

It depends. If your app is simple, single threaded, then a single
connection (global or passed through args) should be fine. Only use
multiple cursors if you need them (multiple threads, multiple
databases, multiple transaction/db isolation levels, etc).

David.