[Numpy-discussion] Dates and times and Datetime64 (again)

Jeff Reback jeffreback at gmail.com
Wed Mar 19 08:25:39 EDT 2014


Dave,

your example is not a problem with numpy per se, rather that the default
generation is in local timezone (same as what python datetime does).
If you localize to UTC you get the results that you expect.

In [49]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1]

In [50]: pd.TimeSeries(values, dates.tz_localize('UTC')).groupby(lambda d:
d.date()).mean()
Out[50]:
2014-04-01    1
2014-04-02    2
2014-04-03    3
dtype: int64

In [51]: records = zip(map(str, dates.tz_localize('UTC')), values)

In [52]: df = pd.DataFrame(np.array(records, dtype=[('dates',
'M8[h]'),('values', float)]))

In [53]: df.set_index('dates').groupby(lambda x: x.date()).mean()
Out[53]:
            values
2014-04-01       1
2014-04-02       2
2014-04-03       3

[3 rows x 1 columns]



On Wed, Mar 19, 2014 at 5:21 AM, Dave Hirschfeld <novin01 at gmail.com> wrote:

> Sankarshan Mudkavi <smudkavi <at> uwaterloo.ca> writes:
>
> >
> > Hey all,
> > It's been a while since the last datetime and timezones discussion thread
> was visited (linked below):
> >
> > http://thread.gmane.org/gmane.comp.python.numeric.general/53805
> >
> > It looks like the best approach to follow is the UTC only approach in the
> linked thread with an optional flag to indicate the timezone (to avoid
> confusing applications where they don't expect any timezone info). Since
> this is slightly more useful than having just a naive datetime64 package
> and
> would be open to extension if required, it's probably the best way to start
> improving the datetime64 library.
> >
> <snip>
> > I would like to start writing a NEP for this followed by implementation,
> however I'm not sure what the format etc. is, could someone direct me to a
> page where this information is provided?
> >
> > Please let me know if there are any ideas, comments etc.
> >
> > Cheers,
> > Sankarshan
> >
>
> See: http://article.gmane.org/gmane.comp.python.numeric.general/55191
>
>
> You could use a current NEP as a template:
> https://github.com/numpy/numpy/tree/master/doc/neps
>
>
> I'm a huge +100 on the simplest UTC fix.
>
> As is, using numpy datetimes is likely to silently give incorrect results -
> something I've already seen several times in end-user data analysis code.
>
> Concrete Example:
>
> In [16]: dates = pd.date_range('01-Apr-2014', '04-Apr-2014', freq='H')[:-1]
>     ...: values = np.array([1,2,3]).repeat(24)
>     ...: records = zip(map(str, dates), values)
>     ...: pd.TimeSeries(values, dates).groupby(lambda d: d.date()).mean()
>     ...:
> Out[16]:
> 2014-04-01    1
> 2014-04-02    2
> 2014-04-03    3
> dtype: int32
>
> In [17]: df = pd.DataFrame(np.array(records, dtype=[('dates', 'M8[h]'),
> ('values', float)]))
>     ...: df.set_index('dates', inplace=True)
>     ...: df.groupby(lambda d: d.date()).mean()
>     ...:
> Out[17]:
>               values
> 2014-03-31  1.000000
> 2014-04-01  1.041667
> 2014-04-02  2.041667
> 2014-04-03  3.000000
>
> [4 rows x 1 columns]
>
> Try it in your timezone and see what you  get!
>
> -Dave
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140319/3eeee6c0/attachment.html>


More information about the NumPy-Discussion mailing list