pandas read dataframe and sum all value same month and year

Mon Feb 4 14:38:12 EST 2019

Deal all,
following Peter's suggestion,

I put the example code:

import pandas as pd
import numpy as np
from datetime import datetime

#input:
start_date = np.array(["2012-01-01 06:00",'2013-01-01 06:00','2014-01-01 06:00'])
end_date   = np.array(["2013-01-01 05:00",'2014-01-01 05:00','2015-01-01 05:00'])
yearfolder = np.array(['2012','2013','2014'])

for ii in range(0, 1):
   df = pd.read_csv('dati.csv',delimiter=',',header=0,parse_dates=True,na_values=-999)
   df['datatime'] = df['datatime'].map(lambda x: datetime.strptime(str(x), "%Y-%m-%d %H:%M"))
   mask = (df['datatime'] > str(start_date[ii])) & (df['datatime'] <=  str(end_date[ii]))
   df = df.loc[mask]
   df = df.reset_index(drop=True)    
   #
   df.groupby(pd.TimeGrouper('m')).sum()

and the example of file:

datatime,T,RH,PSFC,DIR,VEL10,PREC,RAD,CC,FOG
2012-01-01 06:00,  0.4,100, 911,321,  2.5,  0.0,   0,  0,0
2012-01-01 07:00,  0.8,100, 911,198,  0.8,  0.0,   0, 22,0
2012-01-01 08:00,  0.6,100, 912, 44,  1.2,  0.0,  30, 22,0
2012-01-01 09:00,  3.1, 76, 912, 22,  0.8,  0.0, 134, 44,0
2012-01-01 10:00,  3.4, 77, 912, 37,  0.5,  0.0, 191, 67,0
2012-01-01 11:00,  3.5,100, 912,349,  0.4,  0.0, 277, 44,0
2012-01-01 12:00,  3.6,100, 912, 17,  0.9,  0.0, 292, 22,0
2012-01-01 13:00,  3.5,100, 912, 28,  0.3,  0.0, 219, 44,0
2012-01-01 14:00,  3.3, 68, 912, 42,  0.5,  0.0, 151, 22,0

Hope this could help in finding a way to sum value belonging to the same month.

Thanks again, a lot

Diego

On Monday, 4 February 2019 15:50:52 UTC+1, Diego Avesani  wrote:
> Dear all,
> 
> I am reading the following data-frame:
> 
> datatime,T,RH,PSFC,DIR,VEL10,PREC,RAD,CC,FOG
> 2012-01-01 06:00, -0.1,100, 815,313,  2.6,  0.0,   0,  0,0
> 2012-01-01 07:00, -1.2, 93, 814,314,  4.8,  0.0,   0,  0,0
> 2012-01-01 08:00,  1.7, 68, 815,308,  7.5,  0.0,  41, 11,0
> 2012-01-01 09:00,  2.4, 65, 815,308,  7.4,  0.0, 150, 33,0
> 2012-01-01 10:00,  3.0, 64, 816,305,  8.4,  0.0, 170, 44,0
> 2012-01-01 11:00,  2.6, 65, 816,303,  6.3,  0.0, 321, 22,0
> 2012-01-01 12:00,  2.0, 72, 816,278,  1.3,  0.0, 227, 22,0
> 2012-01-01 13:00, -0.0, 72, 816,124,  0.1,  0.0, 169, 22,0
> 2012-01-01 14:00, -0.1, 68, 816,331,  1.4,  0.0, 139, 33,0
> 2012-01-01 15:00, -4.0, 85, 816,170,  0.6,  0.0,  49,  0,0
> ....
> ....
> 
> I read the data frame as:
> 
>  df = pd.read_csv('dati.csv',delimiter=',',header=0,parse_dates=True,na_values=-999)
>    df['datatime'] = df['datatime'].map(lambda x: datetime.strptime(str(x), "%Y-%m-%d %H:%M"))
>    #
>    mask = (df['datatime'] > str(start_date[ii])) & (df['datatime'] <=  str(end_date[ii]))
>    df = df.loc[mask]
>    df = df.reset_index(drop=True)
> 
> I would to create an array with the sum of all the PREC value in the same month.
> 
> I have tried with:
> 
> df.groupby(pd.TimeGrouper('M')).sum()
> 
> But as always, it seems that I have same problems with the indexes. Indeed, I get:
>       'an instance of %r' % type(ax).__name__)
> TypeError: axis must be a DatetimeIndex, but got an instance of 'Int64Index'
> 
> thanks for any kind of help,
> Really Really thanks
> 
> Diego