PEP 249 (database api) -- executemany() with iterable?
Steve Howell
showell30 at yahoo.com
Tue Oct 12 22:13:19 EDT 2010
On Oct 12, 6:01 pm, Lawrence D'Oliveiro <l... at geek-
central.gen.new_zealand> wrote:
> In message <4cb4ba4e$0$1641$742ec... at news.sonic.net>, John Nagle wrote:
>
> > In general, if you find yourself making millions of
> > SQL database requests in a loop, you're doing it wrong.
>
> I’ve done this. Not millions, but certainly on the order of tens of
> thousands.
>
> > Big database loads are usually done by creating a text file
> > with the desired data, then using a LOAD DATA INFILE command.
>
> May not always be flexible enough.
>
> > This (in MySQL) is tens to hundreds of times faster than doing individual
> > INSERT or UPDATE commands.
>
> Why should that be? The database cannot read a text file any faster than I
> can.
Bulk-load strategies usually solve one or more of these problems:
network latency
index maintenance during the upload
parsing of the SQL
reallocating storage
They usually come at some cost or inconvenience:
only work locally
indexes not available during the bulk load
requires non-SQL data format
up-front cost to preallocate storage
The exact benefits and costs will vary according to the system, of
course. Also, you might employ multiple strategies; for example, for
a medium-sized problem, there may be no need to sacrifice the
familiarity and convenience of SQL if you can arrange for large
batches of SQL to be submitted from a local client. If indexes are
the problem, and you don't need 24/7 availability, you can drop them
and rebuild them.
More information about the Python-list
mailing list