Slow loading of large in-memory tables

Wed Sep 15 10:27:24 EDT 2004

* Philipp K. Janert, Ph.D. (2004-09-07 07:14 +0200)
> I am trying to load a relatively large table (about 1 Million
> rows) into an sqlite table, which is kept in memory. The
> load process is very slow - on the order of 15 minutes or
> so.
> 
> I am accessing sqlite from Python, using the pysqlite driver.
> I am loading all records first using cx.execute( "insert ..." ).
> Only once I have run cx.execute() for all records, I commit all 
> the preceding inserts with conn.commit()
> 
> I have tried using cx.executemany(), but if anything, this
> makes the process slower.
> 
> I have not tried mucking manually with transactions.
> I have sufficiently convinced myself that the slow part
> is in fact the cx.execute() - not reading the data from file
> or anything else.
> 
> Are there ways to make this process faster? 

According to [1]:

pragma temp_store    = memory;
# or any bigger value ('2000' is the default)
pragma cache_size    = 4000;
pragma count_changes = off;
pragma synchronous   = off;

Also SQLite makes a commit after every SQL statement (not only those
that alter the database)[2]. Therefor you have to manually start a
transaction before the first SQL statement and manually commit after
the last statement. You have to turn the integrated pysqlite
committing off to be able to do this:

connection = sqlite.connect(database,
                            autocommit = 1)

> Also, I want to keep the DB in memory, since I use it later
> to run a very DB-intensive simulation against it. However,
> this implies that I need to load the DB from the same python
> program which will later run the simulation - I think.

Yes.

Thorsten

[1] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
[2] http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html#transactions