pandas read dataframe and sum all value same month and year

Avi Gross avigross at verizon.net
Mon Feb 4 18:19:34 EST 2019


Diego,

If your goal is to sum data by month, there are more general methods than
you making lists of starting and ending dates.

ONE way to consider in your case that is easy to understand is to add a
derived column/feature to your data such as the first 7 characters of the
date as perhaps a routine string. I mean 

2012-01-01 06:00

Becomes:

"2012-01"

Once you have such a column, you can aggregate your data by that column
using functionality in pandas and it will be done for all items with that
field being the same. 

The extra column can be temporary and there are other ways but it is
conceptually simple.

Of course to do the above does involve lots of details and you don't seem to
need it now, so I am leaving this as an academic hint. 

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Diego Avesani
Sent: Monday, February 4, 2019 2:38 PM
To: python-list at python.org
Subject: Re: pandas read dataframe and sum all value same month and year

Deal all,
following Peter's suggestion,

I put the example code:

import pandas as pd
import numpy as np
from datetime import datetime


#input:
start_date = np.array(["2012-01-01 06:00",'2013-01-01 06:00','2014-01-01
06:00'])
end_date   = np.array(["2013-01-01 05:00",'2014-01-01 05:00','2015-01-01
05:00'])
yearfolder = np.array(['2012','2013','2014'])


for ii in range(0, 1):
   df =
pd.read_csv('dati.csv',delimiter=',',header=0,parse_dates=True,na_values=-99
9)
   df['datatime'] = df['datatime'].map(lambda x: datetime.strptime(str(x),
"%Y-%m-%d %H:%M"))
   mask = (df['datatime'] > str(start_date[ii])) & (df['datatime'] <=
str(end_date[ii]))
   df = df.loc[mask]
   df = df.reset_index(drop=True)    
   #
   df.groupby(pd.TimeGrouper('m')).sum()


and the example of file:

datatime,T,RH,PSFC,DIR,VEL10,PREC,RAD,CC,FOG
2012-01-01 06:00,  0.4,100, 911,321,  2.5,  0.0,   0,  0,0
2012-01-01 07:00,  0.8,100, 911,198,  0.8,  0.0,   0, 22,0
2012-01-01 08:00,  0.6,100, 912, 44,  1.2,  0.0,  30, 22,0
2012-01-01 09:00,  3.1, 76, 912, 22,  0.8,  0.0, 134, 44,0
2012-01-01 10:00,  3.4, 77, 912, 37,  0.5,  0.0, 191, 67,0
2012-01-01 11:00,  3.5,100, 912,349,  0.4,  0.0, 277, 44,0
2012-01-01 12:00,  3.6,100, 912, 17,  0.9,  0.0, 292, 22,0
2012-01-01 13:00,  3.5,100, 912, 28,  0.3,  0.0, 219, 44,0
2012-01-01 14:00,  3.3, 68, 912, 42,  0.5,  0.0, 151, 22,0



Hope this could help in finding a way to sum value belonging to the same
month.

Thanks again, a lot

Diego





On Monday, 4 February 2019 15:50:52 UTC+1, Diego Avesani  wrote:
> Dear all,
> 
> I am reading the following data-frame:
> 
> datatime,T,RH,PSFC,DIR,VEL10,PREC,RAD,CC,FOG
> 2012-01-01 06:00, -0.1,100, 815,313,  2.6,  0.0,   0,  0,0
> 2012-01-01 07:00, -1.2, 93, 814,314,  4.8,  0.0,   0,  0,0
> 2012-01-01 08:00,  1.7, 68, 815,308,  7.5,  0.0,  41, 11,0
> 2012-01-01 09:00,  2.4, 65, 815,308,  7.4,  0.0, 150, 33,0
> 2012-01-01 10:00,  3.0, 64, 816,305,  8.4,  0.0, 170, 44,0
> 2012-01-01 11:00,  2.6, 65, 816,303,  6.3,  0.0, 321, 22,0
> 2012-01-01 12:00,  2.0, 72, 816,278,  1.3,  0.0, 227, 22,0
> 2012-01-01 13:00, -0.0, 72, 816,124,  0.1,  0.0, 169, 22,0
> 2012-01-01 14:00, -0.1, 68, 816,331,  1.4,  0.0, 139, 33,0
> 2012-01-01 15:00, -4.0, 85, 816,170,  0.6,  0.0,  49,  0,0 ....
> ....
> 
> I read the data frame as:
> 
>  df =
pd.read_csv('dati.csv',delimiter=',',header=0,parse_dates=True,na_values=-99
9)
>    df['datatime'] = df['datatime'].map(lambda x: datetime.strptime(str(x),
"%Y-%m-%d %H:%M"))
>    #
>    mask = (df['datatime'] > str(start_date[ii])) & (df['datatime'] <=
str(end_date[ii]))
>    df = df.loc[mask]
>    df = df.reset_index(drop=True)
> 
> I would to create an array with the sum of all the PREC value in the same
month.
> 
> I have tried with:
> 
> df.groupby(pd.TimeGrouper('M')).sum()
> 
> But as always, it seems that I have same problems with the indexes.
Indeed, I get:
>       'an instance of %r' % type(ax).__name__)
> TypeError: axis must be a DatetimeIndex, but got an instance of
'Int64Index'
> 
> thanks for any kind of help,
> Really Really thanks
> 
> Diego

--
https://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list