Python, MS SQL, and batch inserts

ericwoodworth at gmail.com ericwoodworth at gmail.com
Tue Apr 21 14:30:20 EDT 2009


On Apr 21, 2:15 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
> On Apr 21, 2009, at 2:02 PM, ericwoodwo... at gmail.com wrote:
>
>
>
> > Hi,
> >     I have a python script I'm writing that grabs some data from a
> > com object, does a little formatting, and then inserts that data into
> > a MS SQL 2003 DB.  Because I'm using COM objects I'm importing
> > win32com.client.  That also allows me to use ADODB.connection and
> > ADODB.command objects for working with SQL.
>
> >     The program works fine but it's a little slow.  Inserting ~5500
> > rows of data takes about 10 seconds using a DB that is on the same
> > machine running the script.
>
> >     I've done some general searches on how to speed this up and in
> > other languages people suggest sending batches of inserts off at a
> > time instead of executing 1 insert at a time.  For java and .net
> > people recommend using a stringbuilder function to create strings
> > quickly.  I don't know of such a function in python s I tried grouping
> > my inserts into a single string using string += syntax.  I knew that
> > would be terrible but I wanted to see how terrible.  Final reults: It
> > was pretty terrible.  Script went from taking ~18sec to taking
> > 240sec.  The overhead for recreating the strings was monster.  No real
> > surprise there.
>
> >     So I then loaded up the commands into a list and at the end I
> > used the strong join method to create the string.  This was far faster
> > than using += to create my strings but still took twice as long as
> > just running my inserts one at a time.  So I'm looking for
> > suggestions.
>
> >     Basically I have 5000 SQL inserts that I want to do as quickly as
> > possible.  This is purely academic as I can live with the 18 seconds
> > the script needs to run (9 to talk to the com object and format the
> > data and 10 to write to SQL) but I'm still curious how to improve on
> > what I have running.
>
> Are you sure your logjam is in Python? Inserting 5500 rows can take a  
> few seconds if you're COMMITting after each INSERT. Wrap the whole  
> thing in an explicit transaction and see if that helps.
>
> Also, toss in a few print statements containing timestamps so you know  
> more about where the script is spending time.
>
> bye
> Philip

I'm not 100% sure it's python and not SQL but I do suspect there's a
better way to do this than just serial inserts.  I could be wrong
about that which is what i'm trying to explore.
I already do use the time stamps and this is what I see:
at 9 secs in I've gotten my data, formatted it, and placed it on the
list
at 9.047 secs in the string.join() is done and I have my command
string
at 35 secs the program ends.  So somehow my SQL is taking a lot longer
when I format it as single string.

How would I make the whole thing one transaction?  Just insert BEGIN
TRANSACTION at the start and COMMIT at the end?  Is that enough to do
it?



More information about the Python-list mailing list