How to find bad row with db api executemany()?

Fri Mar 29 22:44:53 EDT 2013

In article <roy-A61512.20410329032013 at news.panix.com>,
 Roy Smith <roy at panix.com> wrote:

> In article <mailman.3971.1364595940.2939.python-list at python.org>,
>  Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
> 
> > If using MySQLdb, there isn't all that much difference... MySQLdb is
> > still compatible with MySQL v4 (and maybe even v3), and since those
> > versions don't have "prepared statements", .executemany() essentially
> > turns into something that creates a newline delimited "list" of
> > "identical" (but for argument substitution) statements and submits that
> > to MySQL.
> 
> Shockingly, that does appear to be the case.  I had thought during my 
> initial testing that I was seeing far greater throughput, but as I got 
> more into the project and started doing some side-by-side comparisons, 
> it the differences went away.

OMG, this is amazing.

http://stackoverflow.com/questions/3945642/

It turns out, the MySQLdb executemany() runs a regex over your SQL and 
picks one of two algorithms depending on whether it matches or not.

restr = (r"\svalues\s*"
        r"(\(((?<!\\)'[^\)]*?\)[^\)]*(?<!\\)?'"
        r"|[^\(\)]|"
        r"(?:\([^\)]*\))"
        r")+\))")

Leaving aside the obvious line-noise aspects, the operative problem here 
is that it only looks for "values" (in lower case).

I've lost my initial test script which convinced me that executemany() 
would be a win; I'm assuming I used lower case for that.  Our production 
code uses "VALUES".

The slow way (i.e. "VALUES"), I'm inserting 1000 rows about every 2.4 
seconds.  When I switch to "values", I'm getting more like 1000 rows in 
100 ms!

A truly breathtaking bug.