What is the best data structure for a very simple spreadsheet?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Jan 3 07:28:40 EST 2010


On Sun, 03 Jan 2010 03:27:46 -0800, vsoler wrote:

> My application would contain a limited set of "cells" represented by the
> instances of a Cell class:
> 
> class Cell:
> ...
> 
> A1=Cell(7)
> A2=Cell(2*A1)
> A3=Cell(3*A1+A2)
> A4=Cell(A3*4)
> 
> Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
> 
> Now, I somehow want to be able to show a dependency tree
> 
> 1 level dependency trees
>   A1: None
>   A2: A1
>   A3: A1, A2
>   A4: A3
> 
> All levels dependency trees
> 
>   A1: None
>   A2: A1
>   A3: A1, A2
>   A4: A3, A2, A1
> 
> Leaf + values dependency trees:
> 
>   A1: 7
>   A2: A1=7, 2
>   A3: 3, A1=7, 2
>   A4: 3, A1=7, 2, 4
> 
> What I'd like to know is:
> 
> 1) what are, in your opinion, the basic elements of the Cell class?

def Cell(object):
    def __init__(self, payload):
        self.payload = payload
    def __str__(self):
        return str(self.payload)
    def __float__(self):
        return float(self.payload)
    def dependency(self):
        try:
            return self.payload.dependency()
        except AttributeError:
            return ['None']


Cells can contain one of three things: a number, a string, or a formula.
The first two can be supported by providing a built-in Python object 
(float or str) as payload. You can support formulae two ways that I can 
think of:

(1) Provide a formula as a string, with a leading '='. Then, whenever you 
want to evaluate such a cell, you fetch the string from the cell, parse 
it, generate an arithmetic expression, and calculate it.

(2) Instead of parsing the formula on every single spreadsheet refresh, 
use a couple of helper classes: 

class Indirect(object):
    def __init__(self, ref, sheet=None):
        if sheet is None:
            self.sheet = default_sheet()
        else:
            self.sheet = sheet
        self.ref = ref
    def __str__(self):
        return str(self.sheet[self.ref])
    def float(self):
        return float(self.sheet[self.ref])
    def dependency(self):
        return [self.ref]

class Formula(object):
    def __init__(self, x, y, op):
        self.x = x
        self.y = y
        self.op = op
    def eval(self):
        return self.op(float(x), float(y))
    def dependency(self):
        return self.x.dependency(level) + self.y.dependency(level)
            


Then do something like this:

sheet = {}
sheet['A1'] = Cell(7)
sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
sheet['A3'] = Cell(
  Formula(
    Formula(3, Indirect('A1'), operator.mul), 
    Indirect('A2'), 
    operator.add
    ))
sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))


Then you only need to parse each human-readable formula like '=2*A1' once.


> 2) Do I need a parser to evaluate the formulas like “3*A1+A2”? 

Yes.

> Can you recommend one library that already contains one? 

Try PyParsing.


> 3) Do I need a tree
> data structure to represent my data? would the tree be an attribute of
> the class instance?

I suspect a dict will be faster.


To get the dependencies of each cell:

for key, value in sheet.items():
    print key, value.dependency()


Keep in mind I haven't tested ANY of this -- it is entirely stream of 
consciousness. I've never actually done this, so I have no idea whether 
it is a good approach or not, but it seems to me that it might be.



-- 
Steven



More information about the Python-list mailing list