sqlite INSERT performance

Jon Clements joncle at googlemail.com
Thu May 31 12:06:49 EDT 2012


On Thursday, 31 May 2012 16:25:10 UTC+1, duncan smith  wrote:
> On 31/05/12 06:15, John Nagle wrote:
> > On 5/30/2012 6:57 PM, duncan smith wrote:
> >> Hello,
> >> I have been attempting to speed up some code by using an sqlite
> >> database, but I'm not getting the performance gains I expected.
> >
> > SQLite is a "lite" database. It's good for data that's read a
> > lot and not changed much. It's good for small data files. It's
> > so-so for large database loads. It's terrible for a heavy load of
> > simultaneous updates from multiple processes.
> >
> 
> Once the table is created the data will not be changed at all. 
> Corresponding integer codes will have to be generated for columns. (I 
> want to do this lazily because some columns might never be needed for 
> output files, and processing all columns was relatively expensive for my 
> initial solution.) After that it's a series of 'SELECT a, b, ... FROM 
> table WHERE f="g" ORDER by a, b, ...' style queries dumped to space 
> separated text files.
> 
> > However, wrapping the inserts into a transaction with BEGIN
> > and COMMIT may help.
> >
> 
> Unfortunately there's no discernible difference.
> 
> > If you have 67 columns in a table, you may be approaching the
> > problem incorrectly.
> >
> 
> Quite possibly. I have defined start and end points. The data are 
> contained in text files. I need to do the mapping to integer codes and 
> generate output files for subsets of variables conditional on the levels 
> of other variables. (I was doing the subsequent sorting separately, but 
> if I'm using SQL I guess I might as well include that in the query.) The 
> output files are inputs for other (C++) code that I have no control over.
> 
> Any approach that doesn't consume large amounts of memory will do. Cheers.
> 
> Duncan

It might be worth checking out https://sdm.lbl.gov/fastbit/ which has Python bindings (nb: the library itself takes a while to compile), but I'm not I00% sure it would meet all your requirements.

Jon



More information about the Python-list mailing list