What is the best data structure for a very simple spreadsheet?

Jon Clements joncle at googlemail.com
Sun Jan 3 10:25:49 EST 2010


On Jan 3, 2:58 pm, Lie Ryan <lie.1... at gmail.com> wrote:
> On 1/3/2010 10:27 PM, vsoler wrote:
>
> > 1) what are, in your opinion, the basic elements of the Cell class?
>
> The "user-entered formula" and "effective value". A Cell containing a
> formula "abc" has a value of "abc"; a cell containing the formula "=1+5"
> has a value of "6". You could use the 'property' decorator for the
> "effective value" attribute.
>
> > 2) Do I need a parser to evaluate the formulas like 3*A1+A2 ? Can you
> > recommend one library that already contains one?
>
> Yes you do. PyParsing, regex, write your own parser; choose your own
> poison depending on the expected complexity of the formulaes.
>
> > 3) Do I need a tree data structure to represent my data?
>
> There are two main options for your "primary data structure":
> - table (2x2 list) -- if you expect most of your cells to be filled,
> this is simple, but uses lots of memory
> - sparse table (list/dicts of cells) -- if you have a large table that
> is mostly empty cells
>
> The Dependency Tree is the "side data structure"; a "Cell" is a "Node";
> a Cell's "dependencies" is the Node's "children"; a Cell's "dependant"
> is the Node's "parent". Cells maintains a list of dependencies (or
> dynamically generates them when requested) by analyzing the "formula"
> attribute for references to other cells.
>
> PS: Strictly speaking, this "side data structure" is not a "Tree" as
> there are multiple root nodes. I think it's called "Directed Acyclic
> Graph". However, it's true that, given a cell as a root, the cell's
> dependencies is a proper tree.
>
> > would the
> > tree be an attribute of the class instance?
>
> you could use 'property' decorator and make the dependency tree a
> "computed attribute" of the cell; many people recommended against a
> 'property' that is heavy to compute though and calculating dependencies
> may be quite heavy depending on the complexity of the spreadsheet.
>
> > I imagine a lot can be said on these questions. What I am looking for
> > is some hints that help me get out of where I am now.
>
> > Any help is highly appreciated.
>
> > Vicente Soler
>
>

As well as what Steven & Lie have mentioned, maybe you could get
inspiration from http://pyspread.sourceforge.net/

Although I've not looked at it, my gut feeling is the author would've
had to address these issues (take that as a disclaimer of some
sort :) )

>From not having to had a need for this sort of thing, I'd probably go
for a sparse representation as a graph.

But again, depends why you're segmenting the "spreadsheet" from "need
to tell dependencies".

Jon.



More information about the Python-list mailing list