PEP 249 (database api) -- executemany() with iterable?

Steve Howell showell30 at yahoo.com
Tue Oct 12 22:13:19 EDT 2010


On Oct 12, 6:01 pm, Lawrence D'Oliveiro <l... at geek-
central.gen.new_zealand> wrote:
> In message <4cb4ba4e$0$1641$742ec... at news.sonic.net>, John Nagle wrote:
>
> > In general, if you find yourself making millions of
> > SQL database requests in a loop, you're doing it wrong.
>
> I’ve done this. Not millions, but certainly on the order of tens of
> thousands.
>
> >      Big database loads are usually done by creating a text file
> > with the desired data, then using a LOAD DATA INFILE command.
>
> May not always be flexible enough.
>
> > This (in MySQL) is tens to hundreds of times faster than doing individual
> > INSERT or UPDATE commands.
>
> Why should that be? The database cannot read a text file any faster than I
> can.

Bulk-load strategies usually solve one or more of these problems:

 network latency
 index maintenance during the upload
 parsing of the SQL
 reallocating storage

They usually come at some cost or inconvenience:

 only work locally
 indexes not available during the bulk load
 requires non-SQL data format
 up-front cost to preallocate storage

The exact benefits and costs will vary according to the system, of
course.  Also, you might employ multiple strategies; for example, for
a medium-sized problem, there may be no need to sacrifice the
familiarity and convenience of SQL if you can arrange for large
batches of SQL to be submitted from a local client.  If indexes are
the problem, and you don't need 24/7 availability, you can drop them
and rebuild them.







More information about the Python-list mailing list