reverse engineering Excel spreadsheet

Duncan Smith buzzard at urubu.freeserve.co.uk
Sun Apr 1 11:59:21 EDT 2007


Hello,
     I am currently implementing (mainly in Python) 'models' that come
to me as Excel spreadsheets, with little additional information.  I am
expected to use these models in a web application.  Some contain many
worksheets and various macros.

What I'd like to do is extract the data and business logic so that I can
figure out exactly what these models actually do and code it up.  An
obvious (I think) idea is to generate an acyclic graph of the cell
dependencies so that I can identify which cells contain only data (no
parents) and those that depend on other cells.  If I could also extract
the relationships (functions), then I could feasibly produce something
in pure Python that would mirror the functionality of the original
spreadsheet (using e.g. Matplotlib for plots and more reliable RNGs /
statistical functions).

The final application will be running on a Linux server, but I can use a
Windows box (i.e. win32all) for processing the spreadsheets (hopefully
not manually).  Any advice on the feasibility of this, and how I might
achieve it would be appreciated.

I assume there are plenty of people who have a better knowledge of e.g.
COM than I do.  I suppose an alternative would be to convert to Open
Office and use PyUNO, but I have no experience with PyUNO and am not
sure how much more reliable the statistical functions of Open Office
are.  At the end of the day, the business logic will not generally be
complex, it's extracting it from the spreadsheet that's awkward.  Any
advice appreciated.  TIA.  Cheers.

Duncan



More information about the Python-list mailing list