[SciPy-User] Sum duplicate dates in a series

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Jan 29 14:36:42 EST 2010


On Fri, Jan 29, 2010 at 2:13 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> On Jan 29, 2010, at 2:00 PM, John Hunter wrote:
>> On Fri, Jan 29, 2010 at 12:42 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>> On Jan 29, 2010, at 1:16 PM, Robert Ferrell wrote:
>>>> How can I sum data for duplicate dates in a time series?  I can do it
>>>> with a loop, but I wonder if there is some tricky magic I might use.
>>
>> If you can put your data in a record array, you can use
>> matplotlib.mlab.rec_groupby
>>
>> http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.rec_groupby
>>
>> http://matplotlib.sourceforge.net/examples/misc/rec_groupby_demo.html
>
> John,
> Could you have a look into numpy.lib.recfunctions ? That's an attempt to homogenize what you did for matplotlib, and it'd be great if you could help.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

I just wanted to show that there will be some advantages when it is
possible to easily move between packages

>>> import scikits.timeseries as ts
>>> import la
>>> s = ts.time_series([1,2,3,4,5],dates=ts.date_array(["2001-01","2001-01","2001-02","2001-03","2001-03"],freq="M"))
>>> dta = la.larry(s.data, label=[range(len(s.data))])
>>> dat = la.larry(s.dates.tolist(), label=[range(len(s.data))])
>>> s2 = ts.time_series(dta.group_mean(dat).x,dates=ts.date_array(dat.x,freq="M"))
>>> s
timeseries([1 2 3 4 5],
   dates = [Jan-2001 Jan-2001 Feb-2001 Mar-2001 Mar-2001],
   freq  = M)

>>> s2
timeseries([ 1.5  1.5  3.   4.5  4.5],
   dates = [Jan-2001 Jan-2001 Feb-2001 Mar-2001 Mar-2001],
   freq  = M)

>>> s2u = ts.remove_duplicated_dates(s2)
>>> s2u
timeseries([ 1.5  3.   4.5],
   dates = [Jan-2001 ... Mar-2001],
   freq  = M)

>>> s2u.dates
DateArray([Jan-2001, Feb-2001, Mar-2001],
          freq='M')

It's not so easy yet. But it would be nice if we can use timeseries,
pandas and la for different things depending on the more convenient
representation.

Josef



More information about the SciPy-User mailing list