Creating daily/monthly averages from datafiles

Jonathan Pennington jwpennin at bellsouth.net
Wed Sep 25 23:39:36 EDT 2002


Hello all, I've got a rather complex description/request. I'm building
a program to parse and analyze data for a watershed hydrological
database and am looking for some methods to steal so I don't re-code
what's already been done.

##################
## Description

I have some classes that iterate over text files with formats
similiar to this:

date,time,float,float,int,float,int,int # values every 15 minutes for
					# entire year, multiple years.

The data often has arbitrarily length header, and non-standard
delimiters. I've build a wxPython GUI to allow the user to delete
header lines, then a class that builds a table with a user selected
delimiter used to separate the lines into a wxGrid interface. All
pretty nice. 

Now separate from this, I have a class that will iterate over a
delimited text file and create a dictionary with keys of month number,
each of which is a dictionary with keys as day, each of which is a
list of readings from that day. Pretty basic. I then hacked up methods
(rough, not elegant at all- and not very efficient) that will create a
dictionary of values such that I can parse a datafile by the following:

>>> parser = DataParser(filename)
<...long wait as everything is done in __init__() method automagically...>
>>> parser.DailyAverages[9][16]
[43, 29, 84]
>>> parser.MonthlyAverages[4]
[35, 20, 76]

These are fields that are static to this file (ie. the class is not
very flexible as it was created to fit a specific file layout). The
actual parsing methods are also very rough, iterating over an entire
file to find all instances in a specific month, that sort of
thing. Again, not very elegant.

#################
## Request

Basically, I don't want to re-hack these average/cumtotal methods to
fit working with my wxGrid-ed data. I'm looking to see if anyone has
done any similiar work so that I don't have to reinvent the
wheel. What I need are classes/methods that can take data from a
list/grid/table and manipulate sections of that data by date/time. For
instance, take all values in column 2 and perform daily averages, take
all values in column 3 and perform monthly totals, take all values in
column 4 and perform cumulative totals by average.

Now, of course, the individual methods (ave, cumtotal, etc.) are
easy. What I'd like to find is code that someone has to easily seek
out all of the data by hour/day/month and return it to my manipulation
method, etc.

Of course, just knowing that the data is in a wxGrid is not
necessarily very helpful, the underlying data structure is what would
probably have to be manipulated. Is the wxGrid built from a
dictionary, etc. Right now, I've just dumped the values into a
subclassed wxPyGridTableBase class from the datafile, so there's no
underlying dictionary or other data structure. Ideally, I'd manipulate
the wxGrid directly, but don't know if that's
possible/economical. I'm very open to suggestions.

Does anyone know of some appropriate code, or has anyone built
similiar methods for time-constrained data mining? I've searched the
vaults and assorted other repositories, but nothing- on initial
perusal- seems too promising. Just a couple of methods would do it
really. Otherwise, I'll probably have to build a date-constrained
mining class and perhaps dump that onto the Vaults for others to use
for similiar tasks.

Thanks all,
-J




More information about the Python-list mailing list