[python-win32] Running SQL queries on excel sheets

Tim Golden mail at timgolden.me.uk
Fri Mar 14 17:24:11 CET 2008


Rickey, Kyle W wrote:
> Tim, thanks for your response. I've got 7 excel files that need reading
> containing a total of ~6100 rows. I agree, about this code making me
> sick :)
> 
> In a perfect world I would get all the data into our SQL server and
> write a front end for everyone that needs to access/modify the data, but
> that's a ways down the road. I've still got to convince people in the
> company that excel is NOT a good way to store database info!
> 
> I don't have a problem extracting all the data and throwing it into
> sqlite, but the problem is I would need to extract all the data every
> time I wanted to run my query. I've also got some code laying around to
> COM into excel and read data, but it's not very pretty either.
> 
> Now that I think of it, a good temporary solution might be to have an
> excel macro that runs whenever the file is open, then whenever someone
> saves changes to the excel file, it would dump all that data into our
> SQL server. Then whenever I get around to making that frontend, the data
> will already be in SQL. Thanks for your help.
> 
> -Kyle Rickey

Well you obviously know what you're about, but just to compare things
a bit, I attach two small but complete pieces of code below. The first
generates an Excel workbook of 6000 lines of data, each containing a
hundred numbers randomly shuffled. The second (which is what I'm trying
to get to) pulls the first 20 values from each row and inserts them
into a Noddy table in a sqlite database.

On my respectable-but-not-blazing machine, the first takes about thirty
seconds, the second about two seconds. Now my data is ridiculously
simple but I suppose I wanted to illustrate how easy it *could* be
to pull all of the data from an Excel Workbook.

Those with more rugged and real-world experience of doing this can
chime in with all the crufty bits you'd have to cope with (such as
date/time fields, concurrent access and the like). But perhaps some
people's data is as simple as mine!

<data2xl>
import os, sys
import random

import win32com.client

filename = os.path.abspath ("data.xls")
if os.path.exists (filename):
   os.remove (filename)

xl = win32com.client.gencache.EnsureDispatch ("Excel.Application")
try:
   wb = xl.Workbooks.Add ()
   ws = wb.ActiveSheet

   line = range (100)
   for row in range (1, 6000):
     print "Row", row
     random.shuffle (line)
     ws.Range (ws.Cells (row, 1), ws.Cells (row, len (line))).Value = line

   wb.SaveAs (filename)

finally:
   xl.Quit ()

</data2xl>

<xl2data>
import os, sys

import win32com.client
import sqlite3

filename = os.path.abspath ("data.xls")
db_filename = os.path.abspath ("data.db")
if os.path.exists (db_filename):
   os.remove (db_filename)

xl = win32com.client.gencache.EnsureDispatch ("Excel.Application")
db = sqlite3.connect (db_filename)
db.execute ("""
CREATE TABLE data (
   a INTEGER, b INTEGER, c INTEGER, d INTEGER,
   e INTEGER, f INTEGER, g INTEGER, h INTEGER,
   i INTEGER, j INTEGER, k INTEGER, l INTEGER,
   m INTEGER, n INTEGER, o INTEGER, p INTEGER,
   q INTEGER, r INTEGER, s INTEGER, t INTEGER
)
""")

try:
   wb = xl.Workbooks.Open (filename)
   ws = wb.ActiveSheet

   data = ws.Range (ws.Cells (1, 1), ws.Cells (6000, 20)).Value
   db.executemany ("""
   INSERT INTO data (
     a, b, c, d, e, f, g, h,
     i, j, k, l, m, n, o, p,
     q, r, s, t
   )
   VALUES (
     ?, ?, ?, ?, ?, ?, ?, ?,
     ?, ?, ?, ?, ?, ?, ?, ?,
     ?, ?, ?, ?
   )
   """, data)

finally:
   xl.Quit ()

   for row in db.execute ("SELECT * FROM data LIMIT 100"):
     print row
   db.close ()

</xl2data>

TJG


More information about the python-win32 mailing list