Allowing zero-dimensional subscripts

spam.noam at gmail.com spam.noam at gmail.com
Sat Jun 10 16:00:05 EDT 2006


George Sakkis wrote:
> spam.noam at gmail.com wrote:
>
> > However, I'm designing another library for
> > managing multi-dimensional arrays of data. Its purpose is similiar to
> > that of a spreadsheet - analyze data and preserve the relations between
> > a source of a calculation and its destination.
>
> Sounds interesting. Will it be related at all to OLAP or the
> Multi-Dimensional eXpressions language
> (http://msdn2.microsoft.com/en-us/library/ms145506.aspx) ?
>
Thanks for the reference! I didn't know about any of these. It will
probably be interesting to learn from them. From a brief look at OLAP
in wikipedia, it may have similarities to OLAP. I don't think it will
be related to Microsoft's language, because the language will simply by
Python, hopefully making it very easy to do whatever you like with the
data.

I posted to python-dev a message that (hopefully) better explains my
use for x[]. Here it is - I think that it also gives an idea on how it
will look like.


I'm talking about something similar to a spreadsheet in that it saves
data, calculation results, and the way to produce the results.
However, it is not similar to a spreadsheet in that the data isn't
saved in an infinite two-dimensional array with numerical indices.
Instead, the data is saved in a few "tables", each storing a different
kind of data. The tables may be with any desired number of dimensions,
and are indexed by meaningful indices, instead of by natural numbers.

For example, you may have a table called sales_data. It will store the
sales data in years from set([2003, 2004, 2005]), for car models from
set(['Subaru', 'Toyota', 'Ford']), for cities from set(['Jerusalem',
'Tel Aviv', 'Haifa']). To refer to the sales of Ford in Haifa in 2004,
you will simply write: sales_data[2004, 'Ford', 'Haifa']. If the table
is a source of data (that is, not calculated), you will be able to set
values by writing: sales_data[2004, 'Ford', 'Haifa'] = 1500.

Tables may be computed tables. For example, you may have a table which
holds for each year the total sales in that year, with the income tax
subtracted. It may be defined by a function like this:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate)

Now, like in a spreadsheet, the function is kept, so that if you
change the data, the result will be automatically recalculated. So, if
you discovered a mistake in your data, you will be able to write:

sales_data[2004, 'Ford', 'Haifa'] = 2000

and total_sales[2004] will be automatically recalculated.

Now, note that the total_sales table depends also on the
income_tax_rate. This is a variable, just like sales_data. Unlike
sales_data, it's a single value. We should be able to change it, with
the result of all the cells of the total_sales table recalculated. But
how will we do it? We can write

income_tax_rate = 0.18

but it will have a completely different meaning. The way to make the
income_tax_rate changeable is to think of it as a 0-dimensional table.
It makes sense: sales_data depends on 3 parameters (year, model,
city), total_sales depends on 1 parameter (year), and income_tax_rate
depends on 0 parameters. That's the only difference. So, thinking of
it like this, we will simply write:

income_tax_rate[] = 0.18

Now the system can know that the income tax rate has changed, and
recalculate what's needed. We will also have to change the previous
function a tiny bit, to:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate[])

But it's fine - it just makes it clearer that income_tax_rate[] is a
part of the model that may change its value.


Have a good day,
Noam




More information about the Python-list mailing list