[Numpy-discussion] Dates and times and Datetime64 (again)

Wed Mar 19 10:01:08 EDT 2014

Jeff Reback <jeffreback <at> gmail.com> writes:

> 
> Dave,
> 
> your example is not a problem with numpy per se, rather that the default 
generation is in local timezone (same as what python datetime does).
> If you localize to UTC you get the results that you expect. 
> 

The problem is that the default datetime generation in *numpy* is in local 
time.

Note that this *is not* the case in Python - it doesn't try to guess the 
timezone info based on where in the world you run the code, if it's not 
provided it sets it to None.

In [7]: pd.datetime?
Type:       type
String Form:<type 'datetime.datetime'>
Docstring:
datetime(year, month, day[, hour[, minute[, second[, 
microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints or longs.

In [8]: pd.datetime(2000,1,1).tzinfo is None
Out[8]: True

This may be the best solution but as others have pointed out this is more 
difficult to implement and may have other issues.

I don't want to wait for the best solution - the assume UTC on input/output 
if not specified will solve the problem and this desperately needs to be 
fixed because it's completely broken as is IMHO.

> If you localize to UTC you get the results that you expect. 

That's the whole point - *numpy* needs to localize to UTC, not to whatever 
timezone you happen to be in when running the code. 

In a real-world data analysis problem you don't start with the data in a 
DataFrame or a numpy array it comes from the web, a csv, Excel, a database 
and you want to convert it to a DataFrame or numpy array. So what you have 
from whatever source is a list of tuples of strings and you want to convert 
them into a typed array.

Obviously you can't localize a string - you have to convert it to a date 
first and if you do that with numpy the date you have is wrong. 

In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03-
30 02:00'], dtype='M8[h]')
     ...: dst
     ...: 
Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03-
30T02+0100'], dtype='datetime64[h]')

In [109]: dst.tolist()
Out[109]: 
[datetime.datetime(2014, 3, 30, 0, 0),
 datetime.datetime(2014, 3, 30, 0, 0),
 datetime.datetime(2014, 3, 30, 1, 0)]

AFAICS there's no way to get the original dates back once they've passed 
through numpy's parser!?

-Dave