[SciPy-user] Calculating daily, monthly and seasonal averages of hourly time series data.
Dharhas Pothina
Dharhas.Pothina at twdb.state.tx.us
Thu Oct 9 13:02:10 EDT 2008
This sounds great. I'm going to have to see how complicated it is to
get the recent versions of numpy/scipy/matplotlib installed on Fedora 8
so I can try the timeseries scikit out. From what I could tell there are
no repositories or rpms available for matplotlib 0.98 on Fedora 8 or 9.
thanks for your help.
- dharhas
>>> Lionel Roubeyrie <lroubeyrie at limair.asso.fr> 10/9/2008 9:39 AM >>>
> Ok I think I understood your example below. Can you give me an
example
> of how to deal with missing data?
If you take my last example, you have 3 separated arrays:
mes.data : your values
mes.dates : the date array
mes.mask : like the maskedarray module (timeseries is based on it).
Trying to mask the 10 first values:
##################################
In [16]: mask=zeros_like(mes.data)
In [17]: mask[0:10]=True
In [18]: mask
Out[18]: array([ 1., 1., 1., ..., 0., 0., 0.])
In [19]: mes2=ts.time_series(mes, mask=mask)
In [20]: mes2
Out[20]:
timeseries([-- -- -- ..., 17.9699245692 66.8968405206 24.7117965045],
dates = [01-jan-2007 00:00 ... 30-déc-2008 23:00],
freq = H)
##################################
A timeseries can be constructed based on another timeseries, like I do
here with mes2. Note that just the values are masked (missing), not
the
dates because all fields have a value (masked or not).
> Does this general technique work for data that is on a 15 minute
frequency
Yes, but no :-) The timeseries module doesn't handle directly QH
frequency, but minute frequency (freq='T'). Look at that :
#################################
In [28]: fielddates=ts.date_array(['2007-01-01 00:00', '2007-01-01
00:15', '2007-01-01 00:30', '2007-01-01 00:45'], freq='T')
In [29]: salinity=random(4)*100
In [30]: mes=ts.time_series(data=salinity, dates=fielddates)
In [31]: mes.has_missing_dates()
Out[31]: True
#################################
There's not QH native frequency, then there's some missing dates (you
can also look for duplicated dates, very convenient!). But you can
fill
these missing dates :
###############################
In [36]: mes2=mes.fill_missing_dates()
In [37]: mes2.has_missing_dates()
Out[37]: False
In [38]: mes2
Out[38]:
timeseries([2.33824586442 -- -- -- -- -- -- -- -- -- -- -- -- -- --
36.180901427 -- --
-- -- -- -- -- -- -- -- -- -- -- -- 39.0648471531 -- -- -- -- -- --
--
--
-- -- -- -- -- -- 55.4226606997],
dates = [01-jan-2007 00:00 ... 01-jan-2007 00:45],
freq = T)
###############################
Or the module can handle directly these missing dates when you convert
the timeseries to a lower frequency:
###############################
In [39]: mes.convert(freq='H', func=mean)
Out[39]:
timeseries([ 33.25166379],
dates = [01-jan-2007 00:00],
freq = H)
###############################
You can try with func=None, it will just fill the missing dates with
missing values :-p
> or datasets where the frequency is
> variable (ie some months we have 10 readings other months we may
have
> 30?
Like you see, just pass you datas with the corrects dates, and it
rocks,
but don't mix minute frequency with hour frequency!
Here I take 3 daily samples in january, and one in october :
#################################
In [41]: fielddates=ts.date_array(['2007-01-01', '2007-01-02',
'2007-01-03', '2007-10-15'], freq='D')
In [42]: salinity=random(4)*100
In [43]: mes=ts.time_series(data=salinity, dates=fielddates)
In [44]: mes
Out[44]:
timeseries([ 59.63468614 38.60721076 64.52554805 66.17637291],
dates = [01-jan-2007 02-jan-2007 03-jan-2007 15-oct-2007],
freq = D)
In [45]: mes.convert(freq='M', func=mean)
Out[45]:
timeseries([54.2558149823 -- -- -- -- -- -- -- -- 66.1763729106],
dates = [jan-2007 ... oct-2007],
freq = M)
###################################
Computing the monthly average goes fine, the module fill the missing
months by masked values.
>
> Also how stable is the scikits.timeseries? Is it reasonably usable?
Yes, we use it intensively on large projects and Pierre G.M. has made
a
very good tool.
Cordialy
>
> thanks,
>
> - dharhas
>
> >>> Lionel Roubeyrie <lroubeyrie at limair.asso.fr> 10/9/2008 3:51 AM
>>>
> Hi Dharhas,
> scikits.timeseries is perfect for what you want in a very useable
way
> :
>
> ###############################
> In [29]: import scikits.timeseries as ts
>
> In [30]: sdate=ts.Date('H', '2007-01-01 00:00')
>
> In [31]: fielddates=ts.date_array(start_date=sdate, freq='H',
> length=365*24*2)
>
> In [32]: salinity=random(365*24*2)*100
>
> In [33]: mes=ts.time_series(data=salinity, dates=fielddates)
>
> In [34]: mes
> Out[34]:
> timeseries([ 23.84116045 49.51437251 89.29221711 ..., 37.00510947
> 41.12589836
> 78.65572656],
> dates = [01-jan-2007 00:00 ... 30-déc-2008 23:00],
> freq = H)
>
>
> In [35]: mes_avmonth=mes.convert(freq='M', func=mean)
>
> In [36]: mes_avmonth
> Out[36]:
> timeseries([ 49.29718906 50.64688937 49.88193999 48.97144253
> 49.5788259
> 50.41340038 50.15047009 51.70933261 50.5635153 51.15084406
> 51.15362514 51.51443468 49.17556599 49.26877667 50.21416724
> 49.37037657 51.00724033 49.43337134 49.60398056 50.24470761
> 50.62350109 51.15572702 51.37652011 49.24193747],
> dates = [jan-2007 ... déc-2008],
> freq = M)
>
>
> In [37]: mes_avyear=mes.convert(freq='Y', func=mean)
>
> In [38]: mes_avyear
> Out[38]:
> timeseries([ 50.41903159 50.06468157],
> dates = [2007 2008],
> freq = A-DEC)
>
>
> In [39]: mes_avseason=mes[(mes.month>=5) & (mes.month<=9)].mean()
>
> In [40]: mes_avseason
> Out[40]: 50,33380690600049
> ###############################
>
>
> Le mercredi 08 octobre 2008 à 14:54 -0500, Dharhas Pothina a écrit :
> > Hi,
> >
> > I'm trying to analyze hourly salinity data. I was wondering if
there
> is a simple way of calculating daily, monthly and seasonal averages
of
> hourly time series data.
> >
> > So assuming I have two arrays that contain several years of hourly
> (or every 15min) salinity data: a datetime array called 'fielddates'
& a
> data array called 'salinity'
> >
> > How would I go about getting the various averages. The seasonal
> averages would be say defined as May through September etc.
> >
> > I had a look at scikits.timeseries but it looks like it would
require
> upgrading numpy to install and there isn't enough high level
> documentation on how to use it for me to be confident in picking it
up
> in the time frame I'm looking at. I'm also not completely clear if
it
> can handle stuff that happens on a scale smaller than a day. If
anyone
> can point me to any usage examples for it that would be appreciated.
> >
> > Thanks,
> >
> > - dharhas
> >
> > _______________________________________________
> > SciPy-user mailing list
> > SciPy-user at scipy.org
> > http://projects.scipy.org/mailman/listinfo/scipy-user
> >
--
Lionel Roubeyrie
chargé d'études
LIMAIR - La Surveillance de l'Air en Limousin
http://www.limair.asso.fr
_______________________________________________
SciPy-user mailing list
SciPy-user at scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list