sqlite INSERT performance

duncan smith buzzard at urubu.freeserve.co.uk
Thu May 31 13:05:43 EDT 2012


On 31/05/12 17:06, Jon Clements wrote:
> On Thursday, 31 May 2012 16:25:10 UTC+1, duncan smith  wrote:
>> On 31/05/12 06:15, John Nagle wrote:
>>> On 5/30/2012 6:57 PM, duncan smith wrote:
>>>> Hello,
>>>> I have been attempting to speed up some code by using an sqlite
>>>> database, but I'm not getting the performance gains I expected.
>>>
>>> SQLite is a "lite" database. It's good for data that's read a
>>> lot and not changed much. It's good for small data files. It's
>>> so-so for large database loads. It's terrible for a heavy load of
>>> simultaneous updates from multiple processes.
>>>
>>
>> Once the table is created the data will not be changed at all.
>> Corresponding integer codes will have to be generated for columns. (I
>> want to do this lazily because some columns might never be needed for
>> output files, and processing all columns was relatively expensive for my
>> initial solution.) After that it's a series of 'SELECT a, b, ... FROM
>> table WHERE f="g" ORDER by a, b, ...' style queries dumped to space
>> separated text files.
>>
>>> However, wrapping the inserts into a transaction with BEGIN
>>> and COMMIT may help.
>>>
>>
>> Unfortunately there's no discernible difference.
>>
>>> If you have 67 columns in a table, you may be approaching the
>>> problem incorrectly.
>>>
>>
>> Quite possibly. I have defined start and end points. The data are
>> contained in text files. I need to do the mapping to integer codes and
>> generate output files for subsets of variables conditional on the levels
>> of other variables. (I was doing the subsequent sorting separately, but
>> if I'm using SQL I guess I might as well include that in the query.) The
>> output files are inputs for other (C++) code that I have no control over.
>>
>> Any approach that doesn't consume large amounts of memory will do. Cheers.
>>
>> Duncan
>
> It might be worth checking out https://sdm.lbl.gov/fastbit/ which has Python bindings (nb: the library itself takes a while to compile), but I'm not I00% sure it would meet all your requirements.
>
> Jon

Interesting. It might actually be more useful for the types of query 
performed by the C++ part of the application. I tried something based on 
bitwise operations on integers but couldn't quite match the performance 
of the recursive algorithm implemented (by others) in C++. Cheers.

Duncan



More information about the Python-list mailing list