[SciPy-user] Calculating daily, monthly and seasonal averages of hourly time series data.

Dharhas Pothina Dharhas.Pothina at twdb.state.tx.us
Thu Oct 9 13:02:10 EDT 2008


This sounds great. I'm going to have to see how complicated it is to
get the recent versions of numpy/scipy/matplotlib installed on Fedora 8
so I can try the timeseries scikit out. From what I could tell there are
no repositories or rpms available for matplotlib 0.98 on Fedora 8 or 9.

thanks for your help.

- dharhas

>>> Lionel Roubeyrie <lroubeyrie at limair.asso.fr> 10/9/2008 9:39 AM >>>

> Ok I think I understood your example below. Can you give me an
example
> of how to deal with missing data?
If you take my last example, you have 3 separated arrays:
mes.data : your values
mes.dates : the date array
mes.mask : like the maskedarray module (timeseries is based on it).
Trying to mask the 10 first values:
##################################
In [16]: mask=zeros_like(mes.data)

In [17]: mask[0:10]=True

In [18]: mask
Out[18]: array([ 1.,  1.,  1., ...,  0.,  0.,  0.])

In [19]: mes2=ts.time_series(mes, mask=mask)

In [20]: mes2
Out[20]: 
timeseries([-- -- -- ..., 17.9699245692 66.8968405206 24.7117965045],
           dates = [01-jan-2007 00:00 ... 30-déc-2008 23:00],
           freq  = H)
##################################
A timeseries can be constructed based on another timeseries, like I do
here with mes2. Note that just the values are masked (missing), not
the
dates because all fields have a value (masked or not).


>  Does this general technique work for data that is on a 15 minute
frequency 
Yes, but no :-) The timeseries module doesn't handle directly QH
frequency, but minute frequency (freq='T'). Look at that :
#################################
In [28]: fielddates=ts.date_array(['2007-01-01 00:00', '2007-01-01
00:15', '2007-01-01 00:30', '2007-01-01 00:45'], freq='T')

In [29]: salinity=random(4)*100

In [30]: mes=ts.time_series(data=salinity, dates=fielddates)

In [31]: mes.has_missing_dates()
Out[31]: True
#################################
There's not QH native frequency, then there's some missing dates (you
can also look for duplicated dates, very convenient!). But you can
fill
these missing dates :
###############################
In [36]: mes2=mes.fill_missing_dates()

In [37]: mes2.has_missing_dates()
Out[37]: False

In [38]: mes2
Out[38]: 
timeseries([2.33824586442 -- -- -- -- -- -- -- -- -- -- -- -- -- --
36.180901427 -- --
 -- -- -- -- -- -- -- -- -- -- -- -- 39.0648471531 -- -- -- -- -- --
--
--
 -- -- -- -- -- -- 55.4226606997],
           dates = [01-jan-2007 00:00 ... 01-jan-2007 00:45],
           freq  = T)
###############################
Or the module can handle directly these missing dates when you convert
the timeseries to a lower frequency:
###############################
In [39]: mes.convert(freq='H', func=mean)
Out[39]: 
timeseries([ 33.25166379],
           dates = [01-jan-2007 00:00],
           freq  = H)
###############################
You can try with func=None, it will just fill the missing dates with
missing values :-p

> or datasets where the frequency is
> variable (ie some months we have 10 readings other months we may
have
> 30?
Like you see, just pass you datas with the corrects dates, and it
rocks,
but don't mix minute frequency with hour frequency!
Here I take 3 daily samples in january, and one in october :
#################################
In [41]: fielddates=ts.date_array(['2007-01-01', '2007-01-02',
'2007-01-03', '2007-10-15'], freq='D')

In [42]: salinity=random(4)*100

In [43]: mes=ts.time_series(data=salinity, dates=fielddates)

In [44]: mes
Out[44]: 
timeseries([ 59.63468614  38.60721076  64.52554805  66.17637291],
           dates = [01-jan-2007 02-jan-2007 03-jan-2007 15-oct-2007],
           freq  = D)


In [45]: mes.convert(freq='M', func=mean)
Out[45]: 
timeseries([54.2558149823 -- -- -- -- -- -- -- -- 66.1763729106],
           dates = [jan-2007 ... oct-2007],
           freq  = M)
###################################
Computing the monthly average goes fine, the module fill the missing
months by masked values.

> 
> Also how stable is the scikits.timeseries? Is it reasonably usable?
Yes, we use it intensively on large projects and Pierre G.M. has made
a
very good tool.
Cordialy

> 
> thanks,
> 
> - dharhas
> 
> >>> Lionel Roubeyrie <lroubeyrie at limair.asso.fr> 10/9/2008 3:51 AM
>>>
> Hi Dharhas,
> scikits.timeseries is perfect for what you want in a very useable
way
> :
> 
> ###############################
> In [29]: import scikits.timeseries as ts
> 
> In [30]: sdate=ts.Date('H', '2007-01-01 00:00')
> 
> In [31]: fielddates=ts.date_array(start_date=sdate, freq='H',
> length=365*24*2)
> 
> In [32]: salinity=random(365*24*2)*100
> 
> In [33]: mes=ts.time_series(data=salinity, dates=fielddates)
> 
> In [34]: mes
> Out[34]: 
> timeseries([ 23.84116045  49.51437251  89.29221711 ...,  37.00510947
> 41.12589836
>   78.65572656],
>            dates = [01-jan-2007 00:00 ... 30-déc-2008 23:00],
>            freq  = H)
> 
> 
> In [35]: mes_avmonth=mes.convert(freq='M', func=mean)
> 
> In [36]: mes_avmonth
> Out[36]: 
> timeseries([ 49.29718906  50.64688937  49.88193999  48.97144253
> 49.5788259
>   50.41340038  50.15047009  51.70933261  50.5635153   51.15084406
>   51.15362514  51.51443468  49.17556599  49.26877667  50.21416724
>   49.37037657  51.00724033  49.43337134  49.60398056  50.24470761
>   50.62350109  51.15572702  51.37652011  49.24193747],
>            dates = [jan-2007 ... déc-2008],
>            freq  = M)
> 
> 
> In [37]: mes_avyear=mes.convert(freq='Y', func=mean)
> 
> In [38]: mes_avyear
> Out[38]: 
> timeseries([ 50.41903159  50.06468157],
>            dates = [2007 2008],
>            freq  = A-DEC)
> 
> 
> In [39]: mes_avseason=mes[(mes.month>=5) & (mes.month<=9)].mean()
> 
> In [40]: mes_avseason
> Out[40]: 50,33380690600049
> ###############################
> 
> 
> Le mercredi 08 octobre 2008 à 14:54 -0500, Dharhas Pothina a écrit :
> > Hi,
> > 
> > I'm trying to analyze hourly salinity data. I was wondering if
there
> is a simple way of calculating daily, monthly and seasonal averages
of
> hourly time series data. 
> > 
> > So assuming I have two arrays that contain several years of hourly
> (or every 15min) salinity data: a datetime array called 'fielddates'
& a
> data array called 'salinity'
> > 
> > How would I go about getting the various averages. The seasonal
> averages would be say defined as May through September etc. 
> > 
> > I had a look at scikits.timeseries but it looks like it would
require
> upgrading numpy to install and there isn't enough high level
> documentation on how to use it for me to be confident in picking it
up
> in the time frame I'm looking at. I'm also not completely clear if
it
> can handle stuff that happens on a scale smaller than a day. If
anyone
> can point me to any usage examples for it that would be appreciated.
> > 
> > Thanks,
> > 
> > - dharhas
> > 
> > _______________________________________________
> > SciPy-user mailing list
> > SciPy-user at scipy.org 
> > http://projects.scipy.org/mailman/listinfo/scipy-user 
> > 
-- 
Lionel Roubeyrie
chargé d'études
LIMAIR - La Surveillance de l'Air en Limousin
http://www.limair.asso.fr 


_______________________________________________
SciPy-user mailing list
SciPy-user at scipy.org 
http://projects.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list