How to find bad row with db api executemany()?

Chris Angelico rosuav at gmail.com
Fri Mar 29 22:05:08 EDT 2013


On Sat, Mar 30, 2013 at 12:19 PM, Roy Smith <roy at panix.com> wrote:
> In article <mailman.3977.1364605026.2939.python-list at python.org>,
>  Chris Angelico <rosuav at gmail.com> wrote:
>
>> On Sat, Mar 30, 2013 at 11:41 AM, Roy Smith <roy at panix.com> wrote:
>> > In article <mailman.3971.1364595940.2939.python-list at python.org>,
>> >  Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
>> >
>> >> If using MySQLdb, there isn't all that much difference... MySQLdb is
>> >> still compatible with MySQL v4 (and maybe even v3), and since those
>> >> versions don't have "prepared statements", .executemany() essentially
>> >> turns into something that creates a newline delimited "list" of
>> >> "identical" (but for argument substitution) statements and submits that
>> >> to MySQL.
>> >
>> > Shockingly, that does appear to be the case.  I had thought during my
>> > initial testing that I was seeing far greater throughput, but as I got
>> > more into the project and started doing some side-by-side comparisons,
>> > it the differences went away.
>>
>> How much are you doing per transaction? The two extremes (everything
>> in one transaction, or each line in its own transaction) are probably
>> the worst for performance. See what happens if you pepper the code
>> with 'begin' and 'commit' statements (maybe every thousand or ten
>> thousand rows) to see if performance improves.
>>
>> ChrisA
>
> We're doing it all in one transaction, on purpose.  We start with an
> initial dump, then get updates about once a day.  We want to make sure
> that the updates either complete without errors, or back out cleanly.
> If we ever had a partial daily update, the result would be a mess.
>
> Hmmm, on the other hand, I could probably try doing the initial dump the
> way you describe.  If it fails, we can just delete the whole thing and
> start again.

One transaction for the lot isn't nearly as bad as one transaction per
row, but it can consume a lot of memory on the server - or at least,
that's what I found last time I worked with MySQL. (PostgreSQL works
completely differently, and I'd strongly recommend doing it all as one
transaction if you switch.) It's not guaranteed to help, but if it
won't hurt to try, there's a chance you'll gain some performance.

ChrisA



More information about the Python-list mailing list