reverse engineering Excel spreadsheet

Paddy paddy3118 at googlemail.com
Mon Apr 2 02:33:33 EDT 2007


On Apr 1, 4:59 pm, Duncan Smith <buzz... at urubu.freeserve.co.uk> wrote:
> Hello,
>      I am currently implementing (mainly in Python) 'models' that come
> to me as Excel spreadsheets, with little additional information.  I am
> expected to use these models in a web application.  Some contain many
> worksheets and various macros.
>
> What I'd like to do is extract the data and business logic so that I can
> figure out exactly what these models actually do and code it up.  An
> obvious (I think) idea is to generate an acyclic graph of the cell
> dependencies so that I can identify which cells contain only data (no
> parents) and those that depend on other cells.  If I could also extract
> the relationships (functions), then I could feasibly produce something
> in pure Python that would mirror the functionality of the original
> spreadsheet (using e.g. Matplotlib for plots and more reliable RNGs /
> statistical functions).
>
> The final application will be running on a Linux server, but I can use a
> Windows box (i.e. win32all) for processing the spreadsheets (hopefully
> not manually).  Any advice on the feasibility of this, and how I might
> achieve it would be appreciated.
>
> I assume there are plenty of people who have a better knowledge of e.g.
> COM than I do.  I suppose an alternative would be to convert to Open
> Office and use PyUNO, but I have no experience with PyUNO and am not
> sure how much more reliable the statistical functions of Open Office
> are.  At the end of the day, the business logic will not generally be
> complex, it's extracting it from the spreadsheet that's awkward.  Any
> advice appreciated.  TIA.  Cheers.
>
> Duncan

Hi Duncan,
OOffice can save sheets in Sylk format which gives you a simple
textual format for cells, including the equations. Can't think
of any easier way with the macros other than hard slog!

P.S. It is well to remember that the UK Tax department have a
very low opinion of the quality of spreadsheets so if you find
oddities remember to query them.

- Paddy.




More information about the Python-list mailing list