reverse engineering Excel spreadsheet

irstas at gmail.com irstas at gmail.com
Sun Apr 1 12:13:10 EDT 2007


On Apr 1, 6:59 pm, Duncan Smith <buzz... at urubu.freeserve.co.uk> wrote:
> Hello,
>      I am currently implementing (mainly in Python) 'models' that come
> to me as Excel spreadsheets, with little additional information.  I am
> expected to use these models in a web application.  Some contain many
> worksheets and various macros.
>
> What I'd like to do is extract the data and business logic so that I can
> figure out exactly what these models actually do and code it up.  An
> obvious (I think) idea is to generate an acyclic graph of the cell
> dependencies so that I can identify which cells contain only data (no
> parents) and those that depend on other cells.  If I could also extract
> the relationships (functions), then I could feasibly produce something
> in pure Python that would mirror the functionality of the original
> spreadsheet (using e.g. Matplotlib for plots and more reliable RNGs /
> statistical functions).
>
> The final application will be running on a Linux server, but I can use a
> Windows box (i.e. win32all) for processing the spreadsheets (hopefully
> not manually).  Any advice on the feasibility of this, and how I might
> achieve it would be appreciated.
>
> I assume there are plenty of people who have a better knowledge of e.g.
> COM than I do.  I suppose an alternative would be to convert to Open
> Office and use PyUNO, but I have no experience with PyUNO and am not
> sure how much more reliable the statistical functions of Open Office
> are.  At the end of the day, the business logic will not generally be
> complex, it's extracting it from the spreadsheet that's awkward.  Any
> advice appreciated.  TIA.  Cheers.
>
> Duncan

I'm not sure I understood what kind of information you want to
get out of the Excel sheet, sorry. But I hope this'll get you started
(at least it has a few nice tokens that might help you in googling):

import win32com.client

class Excel:
    def __init__(self, filename):
        self.closed = True
        self.xlApp =
win32com.client.dynamic.Dispatch('Excel.Application')
        self.xlBook = self.xlApp.Workbooks.Open(filename)
        self.closed = False

    def sheet(self, sheetName):
        return self.xlBook.Worksheets(sheetName)

    def __del__(self):
        if not self.closed:
            self.close()

    def close(self):
        self.xlBook.Close(SaveChanges=1)
        self.xlApp.Quit()
        self.closed = True

excel = Excel('file.xls')
sheet = excel.sheet(1)
print sheet.Cells(6, 3)


I used it a few years ago to read and populate spreadsheet cells.




More information about the Python-list mailing list