[python-win32] Running SQL queries on excel sheets
Tim Golden
mail at timgolden.me.uk
Fri Mar 14 17:24:11 CET 2008
Rickey, Kyle W wrote:
> Tim, thanks for your response. I've got 7 excel files that need reading
> containing a total of ~6100 rows. I agree, about this code making me
> sick :)
>
> In a perfect world I would get all the data into our SQL server and
> write a front end for everyone that needs to access/modify the data, but
> that's a ways down the road. I've still got to convince people in the
> company that excel is NOT a good way to store database info!
>
> I don't have a problem extracting all the data and throwing it into
> sqlite, but the problem is I would need to extract all the data every
> time I wanted to run my query. I've also got some code laying around to
> COM into excel and read data, but it's not very pretty either.
>
> Now that I think of it, a good temporary solution might be to have an
> excel macro that runs whenever the file is open, then whenever someone
> saves changes to the excel file, it would dump all that data into our
> SQL server. Then whenever I get around to making that frontend, the data
> will already be in SQL. Thanks for your help.
>
> -Kyle Rickey
Well you obviously know what you're about, but just to compare things
a bit, I attach two small but complete pieces of code below. The first
generates an Excel workbook of 6000 lines of data, each containing a
hundred numbers randomly shuffled. The second (which is what I'm trying
to get to) pulls the first 20 values from each row and inserts them
into a Noddy table in a sqlite database.
On my respectable-but-not-blazing machine, the first takes about thirty
seconds, the second about two seconds. Now my data is ridiculously
simple but I suppose I wanted to illustrate how easy it *could* be
to pull all of the data from an Excel Workbook.
Those with more rugged and real-world experience of doing this can
chime in with all the crufty bits you'd have to cope with (such as
date/time fields, concurrent access and the like). But perhaps some
people's data is as simple as mine!
<data2xl>
import os, sys
import random
import win32com.client
filename = os.path.abspath ("data.xls")
if os.path.exists (filename):
os.remove (filename)
xl = win32com.client.gencache.EnsureDispatch ("Excel.Application")
try:
wb = xl.Workbooks.Add ()
ws = wb.ActiveSheet
line = range (100)
for row in range (1, 6000):
print "Row", row
random.shuffle (line)
ws.Range (ws.Cells (row, 1), ws.Cells (row, len (line))).Value = line
wb.SaveAs (filename)
finally:
xl.Quit ()
</data2xl>
<xl2data>
import os, sys
import win32com.client
import sqlite3
filename = os.path.abspath ("data.xls")
db_filename = os.path.abspath ("data.db")
if os.path.exists (db_filename):
os.remove (db_filename)
xl = win32com.client.gencache.EnsureDispatch ("Excel.Application")
db = sqlite3.connect (db_filename)
db.execute ("""
CREATE TABLE data (
a INTEGER, b INTEGER, c INTEGER, d INTEGER,
e INTEGER, f INTEGER, g INTEGER, h INTEGER,
i INTEGER, j INTEGER, k INTEGER, l INTEGER,
m INTEGER, n INTEGER, o INTEGER, p INTEGER,
q INTEGER, r INTEGER, s INTEGER, t INTEGER
)
""")
try:
wb = xl.Workbooks.Open (filename)
ws = wb.ActiveSheet
data = ws.Range (ws.Cells (1, 1), ws.Cells (6000, 20)).Value
db.executemany ("""
INSERT INTO data (
a, b, c, d, e, f, g, h,
i, j, k, l, m, n, o, p,
q, r, s, t
)
VALUES (
?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?
)
""", data)
finally:
xl.Quit ()
for row in db.execute ("SELECT * FROM data LIMIT 100"):
print row
db.close ()
</xl2data>
TJG
More information about the python-win32
mailing list