Python, MS SQL, and batch inserts

Philip Semanchuk philip at semanchuk.com
Tue Apr 21 14:15:42 EDT 2009


On Apr 21, 2009, at 2:02 PM, ericwoodworth at gmail.com wrote:

> Hi,
>     I have a python script I'm writing that grabs some data from a
> com object, does a little formatting, and then inserts that data into
> a MS SQL 2003 DB.  Because I'm using COM objects I'm importing
> win32com.client.  That also allows me to use ADODB.connection and
> ADODB.command objects for working with SQL.
>
>     The program works fine but it's a little slow.  Inserting ~5500
> rows of data takes about 10 seconds using a DB that is on the same
> machine running the script.
>
>     I've done some general searches on how to speed this up and in
> other languages people suggest sending batches of inserts off at a
> time instead of executing 1 insert at a time.  For java and .net
> people recommend using a stringbuilder function to create strings
> quickly.  I don't know of such a function in python s I tried grouping
> my inserts into a single string using string += syntax.  I knew that
> would be terrible but I wanted to see how terrible.  Final reults: It
> was pretty terrible.  Script went from taking ~18sec to taking
> 240sec.  The overhead for recreating the strings was monster.  No real
> surprise there.
>
>     So I then loaded up the commands into a list and at the end I
> used the strong join method to create the string.  This was far faster
> than using += to create my strings but still took twice as long as
> just running my inserts one at a time.  So I'm looking for
> suggestions.
>
>     Basically I have 5000 SQL inserts that I want to do as quickly as
> possible.  This is purely academic as I can live with the 18 seconds
> the script needs to run (9 to talk to the com object and format the
> data and 10 to write to SQL) but I'm still curious how to improve on
> what I have running.

Are you sure your logjam is in Python? Inserting 5500 rows can take a  
few seconds if you're COMMITting after each INSERT. Wrap the whole  
thing in an explicit transaction and see if that helps.

Also, toss in a few print statements containing timestamps so you know  
more about where the script is spending time.


bye
Philip




More information about the Python-list mailing list