[SciPy-user] Calculating daily averages from a timeseries without using the timeseries package.

John Hunter jdh2358 at gmail.com
Tue Dec 2 17:15:55 EST 2008


On Tue, Dec 2, 2008 at 2:21 PM, Dharhas Pothina
<Dharhas.Pothina at twdb.state.tx.us> wrote:
> Hi All,
>
> I have two arrays t & sal. t was created from an array of datetimes using the date2num function. The timeseries is approximately at an hourly frequency but there are days with little or no data or data at a non hourly frequency. How would I calculate the average of all salinity values on a particular day and form a new time series.
>
> t_days, sal_dailyavg
>
> I eventually plan to use the timeseries toolkit for my timeseries analysis but I'm close to the end of a project right now and don't have the time to install and learn it right now so I was hoping someone knew how to do this within numpy/scipy. I can think of a fairly laborious way using looping through each day and selecting the data in that day calculating the average and populating a new array.

You can use some of the rec* functions in matplotlib.mlab

    import matplotlib.mlab as mlab
    import numpy as np

    # create a date column
    dates = np.array([d.date() for d in datetimes])

    create a record array with the columns you need to analyze
    r = np.rec.fromarrays([dates, values], names='date,value'])

    # stats is a list of (input_name, function, output_name)
    stats = [('values', np.mean, 'means')]

    # you can gropup by one or more attrs, eg 'date', or ['year', 'month']
    rsummary = mlab.rec_groupby(r, ['date'], stats)

    # pretty print the output
    print mlab.rec2txt(rsummary)



More information about the SciPy-User mailing list