Newbie Question: python mysqldb performance question

John Nagle nagle at animats.com
Sun May 20 22:49:53 EDT 2007


cjl wrote:
> Group:
> 
> I'm new to python and new to mysql.
> 
> I have a csv file that is about 200,000 rows that I want to add to a
> mysql database.  Yes, I know that I can do this directly from the
> mysql command line, but I am doing it through a python script so that
> I can munge the data before adding it.
> 
> I have the following example code:
> 
> conn = MySQLdb.connect(db="database", host="localhost", user="root",
> passwd="password")
> c = conn.cursor()
> 
> reader = csv.reader(open(sys.argv[1]))
> for row in reader:
>     data1, data2, data3, data4 = row
>     data = (data1,data2,data3,data4)
>     c.execute("""insert into datatable values (%s, %s, %s, %s)""",
> data)
>     conn.commit()
> 
> This takes a really long time to execute, on the order of minutes.
> Directly importing the csv file into mysql using 'load infile' takes
> seconds.
> 
> What am I doing wrong? What can I do to speed up the operation?

      "LOAD INFILE" is generally faster than doing
many inserts.  There are ways to speed things up, especially if
you can reconfigure MySQL to use more memory.  You may want to
load the data without indexing it, then build the indices.  Ask
in a MySQL group, or read the manual.

      If you can use LOAD INFILE, do so.  Preferably from an empty
database.  Then it can sort the records and insert them all at once.
You can create a file for LOAD INFILE from Python, then issue the
LOAD INFILE command.

      A few minutes isn't a "really long time" for that.  I had to do
15,000,000 INSERT operations a few months back, and it took three days.

				John Nagle




More information about the Python-list mailing list