[Tutor] data analysis with python

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Nov 14 14:59:25 CET 2012

On 14 November 2012 03:17, David Martins <awesome.me.dm at outlook.com> wrote:
> Hi All
> I'm trying to use python for analysing data from building energy simulations
> and was wondering whether there is way to do this without using anything sql
> like.

There are many ways to do this.

> The simulations are typically run for a full year, every hour, i.e. there
> are 8760 rows and about 100+ variables such as external air temperature,
> internal air temperature, humidity, heating load, ... making roughly a
> million data points. I've got the data in a csv file and also managed to
> write it in a sqlite db.

This dataset is not so big that you can't just load it all into memory.

> I would like to make requests like the following:
> Show the number of hours the aircon is running at 10%, 20%, ..., 100%
> Show me the average, min, max air temperature, humidity, solar gains,....
> when the aircon is running at 10%, 20%,...,100%
> Eventually I'd also like to generate an automated html or pdf report with
> graphs. Creating graphs is actually somewhat essential.

Do you mean graphs or plots? I would use matplotlib for plotting. It
can automatically generate image files of plots. There are also ways
to generate output for visualising graphs but I guess that's not what
you mean. Probably I would create a pdf report using latex and
matplotlib but that's not the only way.

> I tried sql  and find it horrible, error prone, too much to write, the logic
> somehow seems to work different than my brain and I couldn't find
> particulary good documentation (particulary the documentation of the api is
> terrible, in my humble opinion). I heard about zope db which might be an
> alternative. Would you mind pointing me towards an appropriate way to solve
> my problem? Is there a way for me to avoid having to learn sql or am I
> doomed?

There are many ways to avoid learning SQL. I'll suggest the simplest
one: Can you not just read all the data into memory and then perform
the computations you want?

For example:

$ cat tmp.csv

$ cat tmp.py
#!/usr/bin/env python

import csv

with open('tmp.csv', 'rb') as f:
    reader = csv.DictReader(f)
    data = []
    for row in reader:
        row = dict((k, float(v)) for k, v in row.items())

maxtemp = max(row['Temp'] for row in data)
mintemp = min(row['Temp'] for row in data)
meanhumidity = sum(row['Humidity'] for row in data) / len(data)

print('max temp is: %d' % maxtemp)
print('min temp is: %d' % mintemp)
print('mean humidity is: %f' % meanhumidity)

$ ./tmp.py
max temp is: 26
min temp is: 23
mean humidity is: 85.333333

This approach can also be extended to the case where you don't read
all the data into memory.


More information about the Tutor mailing list