[Numpy-discussion] Problem with importing csv into datetime64

Grové grove.steyn at gmail.com
Wed Sep 28 12:15:44 EDT 2011


Hi,

I am trying out the latest version of numpy 2.0 dev:

np.__version__
Out[44]: '2.0.0.dev-aded70c'

I am trying to import CSV data that looks like this:

date,system,pumping,rgt,agt,sps,eskom_import,temperature,wind,pressure,weather
2007-01-01 00:30,481.9,,,,,481.9,15,SW,1040,Fine
2007-01-01 01:00,471.9,,,,,471.9,15,SW,1040,Fine
2007-01-01 01:30,455.9,,,,,455.9,,,,
etc.

by using the following code:

convertdict = {0: lambda s: np.datetime64(s, 'm'), 1: lambda s: float(s or 0), 
2: lambda s: float(s or 0), 3: lambda s: float(s or 0), 4: lambda s: float(s or 
0), 5: lambda s: float(s or 0), 6: lambda s: float(s or 0), 7: lambda s: float(s 
or 0), 8: str, 9: str, 10: str}
dt = [('date', np.datetime64),('system', float), ('pumping', float),('rgt', 
float), ('agt', float), ('sps', float) ,('eskom_import', float), ('temperature', 
float), ('wind', str), ('pressure', float), ('weather', str)]
a = np.recfromcsv(fp, dtype=dt, converters=convertdict, usecols=range(0-11), 
names=True)         

The dtype it generates for a.date is 'object':

array([2007-01-01T00:30+0200, 2007-01-01T01:00+0200, 2007-01-01T01:30+0200,
       ..., 2007-12-31T23:00+0200, 2007-12-31T23:30+0200,
       2008-01-01T00:00+0200], dtype=object)

But I need it to be datetime64, like in this example (but including hrs and 
minutes):

array(['2011-07-11', '2011-07-12', '2011-07-13', '2011-07-14',
       '2011-07-15', '2011-07-16', '2011-07-17'], dtype='datetime64[D]')

It seems that the CSV import creates an embedded object datetype for 'date' 
rather than a datetime64 data type.  Any ideas on how to fix this?

Grové






More information about the NumPy-Discussion mailing list