generating list of files matching condition

Seb spluque at gmail.com
Wed Nov 23 22:58:16 EST 2016


Hello,

Given a list of files:

In [81]: ec_files[0:10]
Out[81]: 

[u'EC_20160604002000.csv',
 u'EC_20160604010000.csv',
 u'EC_20160604012000.csv',
 u'EC_20160604014000.csv',
 u'EC_20160604020000.csv']

where the numbers are are a timestamp with format %Y%m%d%H%M%S, I'd like
to generate a list of matching files for each 2-hr period in a 2-h
frequency time series.  Ultimately I'm using Pandas to read and handle
the data in each group of files.  For the task of generating the files
for each 2-hr period, I've done the following:

beg_tstamp = pd.to_datetime(ec_files[0][-18:-4],
                            format="%Y%m%d%H%M%S")
end_tstamp = pd.to_datetime(ec_files[-1][-18:-4],
                            format="%Y%m%d%H%M%S")
tstamp_win = pd.date_range(beg_tstamp, end_tstamp, freq="2H")

So tstamp_win is the 2-hr frequency time series spanning the timestamps
in the files in ec_files.

I've generated the list of matching files for each tstamp_win using a
comprehension:

win_files = []
for i, w in enumerate(tstamp_win):
    nextw = w + pd.Timedelta(2, "h")
    ifiles = [x for x in ec_files if
              pd.to_datetime(x[-18:-4], format="%Y%m%d%H%M%S") >= w and
              pd.to_datetime(x[-18:-4], format="%Y%m%d%H%M%S") < nextw]
    win_files.append(ifiles)

However, this is proving very slow, and was wondering whether there's a
better/faster way to do this.  Any tips would be appreciated.


-- 
Seb




More information about the Python-list mailing list