[Numpy-discussion] Dates and times and Datetime64 (again)
Dave Hirschfeld
novin01 at gmail.com
Wed Mar 19 10:01:08 EDT 2014
Jeff Reback <jeffreback <at> gmail.com> writes:
>
> Dave,
>
> your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
> If you localize to UTC you get the results that you expect.
>
The problem is that the default datetime generation in *numpy* is in local
time.
Note that this *is not* the case in Python - it doesn't try to guess the
timezone info based on where in the world you run the code, if it's not
provided it sets it to None.
In [7]: pd.datetime?
Type: type
String Form:<type 'datetime.datetime'>
Docstring:
datetime(year, month, day[, hour[, minute[, second[,
microsecond[,tzinfo]]]]])
The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints or longs.
In [8]: pd.datetime(2000,1,1).tzinfo is None
Out[8]: True
This may be the best solution but as others have pointed out this is more
difficult to implement and may have other issues.
I don't want to wait for the best solution - the assume UTC on input/output
if not specified will solve the problem and this desperately needs to be
fixed because it's completely broken as is IMHO.
> If you localize to UTC you get the results that you expect.
That's the whole point - *numpy* needs to localize to UTC, not to whatever
timezone you happen to be in when running the code.
In a real-world data analysis problem you don't start with the data in a
DataFrame or a numpy array it comes from the web, a csv, Excel, a database
and you want to convert it to a DataFrame or numpy array. So what you have
from whatever source is a list of tuples of strings and you want to convert
them into a typed array.
Obviously you can't localize a string - you have to convert it to a date
first and if you do that with numpy the date you have is wrong.
In [108]: dst = np.array(['2014-03-30 00:00', '2014-03-30 01:00', '2014-03-
30 02:00'], dtype='M8[h]')
...: dst
...:
Out[108]: array(['2014-03-30T00+0000', '2014-03-30T00+0000', '2014-03-
30T02+0100'], dtype='datetime64[h]')
In [109]: dst.tolist()
Out[109]:
[datetime.datetime(2014, 3, 30, 0, 0),
datetime.datetime(2014, 3, 30, 0, 0),
datetime.datetime(2014, 3, 30, 1, 0)]
AFAICS there's no way to get the original dates back once they've passed
through numpy's parser!?
-Dave
More information about the NumPy-Discussion
mailing list