[Datetime-SIG] how does PEP-495 help improve dateutil, pytz timezone packages?

Akira Li 4kir4.1i at gmail.com
Wed Aug 26 01:19:30 CEST 2015


Alexander Belopolsky <alexander.belopolsky at gmail.com> writes:

> I am changing the subject line because neither of the PEPs mentioned in the
> original subject propose any changes to the time module.

This list is about improving datetime module, in particular
PEP-495. I've changed the subject accordingly.

As long as datetime module uses time module, the corresponding time
module issues that cannot be worked-around *are* datetime module
issues. It seems blatantly obvious.

If you don't consider the tz_ database to be essential for writing the
code that works with timezones; you may stop reading now.

.. _tz: https://www.iana.org/time-zones/repository/tz-link.html


What is discussed here ?
------------------------

It seems there is a communication problem. Let's overcommunicate then :)

The time module issues are mentioned to explain why it is not reasonable
to expect datetime.now(timezone.utc).astimezone() to work in the general
case.

stdlib's astimezone() is mentioned to point out that it may fail while
pytz works in the exact same case.

I want to demonstrate that pytz works in cases where stdlib and dateutil
fail currently to point out that *PEP-495 should either provide more
support for the way pytz works or demonstrate how PEP-495 fixes design
issues in stdlib and dateutil that make it difficult to enable better
timezone support.*


Why PEP-495 -- Local Time Disambiguation should care about zoneinfo?
--------------------------------------------------------------------

History shows that the current datetime API is at least partially
responsible that the only working solution (pytz) has more complicated API.
*it works but it might have been simpler and less error-prone.*

The same could be said about stdlib, dateutil, and the timezone packages
that are built on top of them such as arrow, delorean. The difference is
that they work in less cases (fail more).

Therefore even if the explicit goal of PEP-495 is different from
PEP-431; PEP-495 should avoid making the life more difficult for
zoneinfo packages or even more: it should consider *how it can help
pytz, dateutil, or some other timezone package to provide a good
tzdata-related API.*

The last part is the reason I've mentioned cases where stdlib, dateutil
fail in this thread.


What are possible good timezone API examples?
---------------------------------------------

>From the _minimalistic_ category: times_ Python package -- utc/posix time
internally, local time is used only for input or display (similar to
Unicode sandwich approach: Unicode internally, use bytes only if
necessary to communicate with the outside world). No implicit timezone
conversions.

It is unfortunately no longer supported. It is implemented on top of
arrow_ which (last time I've checked) has the same issues as dateutil.

>From the _kitchen-sink_ category: Time4J_ Java package -- a few composable
primitives provide powerful API. Notable feature: no temporal
arithmetic or manipulations for ZonalDateTime_.

.. _times: https://github.com/nvie/times/
.. _arrow: http://arrow.readthedocs.org
.. _Time4J: https://github.com/MenoData/Time4J
.. _ZonalDateTime: http://www.time4j.net/tutorial/zdt.html


What are examples of timezone-related issues that PEP-495 could solve? 
----------------------------------------------------------------------

- utc -> local timezone conversions in dateutil. I haven't looked at the
  source but Stuart Bishop_ says that the new flag may fix this and
  perhaps other issues caused by ambiguous times

- datetime constructor method might start working with pytz timezones.
  The general goal is to leave pytz localize() method only for those
  people who need an exception for ambiguous or non-existent times.

The important part is that PEP-495 should not make it even more
difficult to use the packages correctly.

Ideally, PEP-495 should evolve with the corresponding experimental
implementations that adapt the new flag.

.. _Bishop: https://mail.python.org/pipermail/datetime-sig/2015-August/000466.html


> On Tue, Aug 25, 2015 at 11:47 AM, Akira Li <4kir4.1i at gmail.com> wrote:
>>
>> Alexander Belopolsky <alexander.belopolsky at gmail.com> writes:
>>
>> >> On Aug 25, 2015, at 7:44 AM, Akira Li <4kir4.1i at gmail.com> wrote:
>> >>
>> >> note: stdlib variant datetime.now(timezone.utc).astimezone() may fail
> if it
>> >> uses time.timezone, time.tzname internally [3,4,5] when tm_gmtoff
>> >> tm_zone are not available on a given platform.
>> >
>> > If this actually happens on any supported platform - please file a bug
>> > report.  What we do in this case is not as simplistic as you describe.
>>
>> Bug-driven development is probably not the best strategy for a datetime
>> library ;) Tests can't catch all bugs. I've found out that astimezone()
>> may fail by *reading* its source and trying to *understand* what it does.
>
>
> I agree, but once you've read the code and see any logical errors, you
> should be able to construct a test case demonstrating wrong behavior.

I did.

>>
>> Here's the part from datetime.py [1] that computes the local timezone if
>> tm_gmtoff or tm_zone are not available:
>>
>>   # Compute UTC offset and compare with the value implied
>>   # by tm_isdst.  If the values match, use the zone name
>>   # implied by tm_isdst.
>>   delta = local - datetime(*_time.gmtime(ts)[:6])
>>   dst = _time.daylight and localtm.tm_isdst > 0
>>   gmtoff = -(_time.altzone if dst else _time.timezone)
>>   if delta == timedelta(seconds=gmtoff):
>>       tz = timezone(delta, _time.tzname[dst])
>>   else:
>>       tz = timezone(delta)
>>
>> Here's its C equivalent [2].
>>
>> Python issues that I've linked in the previous message [3,4,5] demonstrate
>> that time.timezone and time.tzname may have wrong values and therefore
>> the result *tz* may have a wrong tzname.
>
>
> To summarize for those who  will not follow the links: [3] Is a closed "No
> obvious and correct way to get the time zone offset" issue.  It was
> superseded by <http://bugs.python.org/issue9527> which in turn was closed
> by implementing the argument-less .astimezone() method.  [4] and [5] are
> time module issues.

Look at the code example immediately above the text you are commenting
on. Look at _time.tzname, _time.timezone there.  It is the code from
datetime.astimezone() method. If timezone, tzname may be wrong then
astimezone() may also fail. The example below demonstrates the
failure.

The issues that I've linked demonstrate specific cases when timezone,
tzname are wrong. The status of the issues is irrelevant (timezone,
tzname behavior hasn't changed).

>>
>> Here's an example inspired by
>> "incorrect time.timezone value" Python issue [4]:
>>
>>   >>> from datetime import datetime, timezone
>>   >>> from email.utils import parsedate_to_datetime
>>   >>> import tzlocal # to get local timezone as pytz timezone
>>   >>> d = parsedate_to_datetime("Tue, 28 Oct 2013 14:27:54 +0000")
>>   >>> # expected (TZ=Europe/Moscow)
>>   ...
>>   >>> d.astimezone(tzlocal.get_localzone()).strftime('%Z%z')
>>   'MSK+0400'
>>   >>> # got
>>   ...
>>   >>> d.astimezone().strftime('%Z%z')
>>   'UTC+04:00+0400'
>>
>
> I don't understand why you keep presenting a mix of pytz, email.utils and
> something called "tzlocal" and then claim that the unexpected behavior
> indicates a problem in the datetime module?  It could as well be in any of
> the three other modules that you use or in the way you combine them.

*"something called "tzlocal""*

  >>> import tzlocal # to get local timezone as pytz timezone
  
 really, neither the comment ^^^ nor the code example
 d.astimezone(tzlocal.get_localzone()) itself told you nothing :)

The purpose is to demonstrate that pytz works without relying on
tm_gmtoff, tm_zone attributes while at the same time astimezone() fails
here.

Your own code below produces MSK+0400 that implies that you do know that
it is the correct answer even if it weren't obvious just by looking at
the result strings. I don't understand how you could even suggest that
MSK+0400 is wrong and UTC+04:00+0400 is the correct behavior here.

Here's a distilled example:

  >>> from datetime import datetime, timezone
  >>> datetime(2013, 10, 28, tzinfo=timezone.utc).astimezone().strftime('%Z%z')

If you *disable tm_gmtoff attribute* then it produces UTC+04:00+0400.
That differs from the expected output MSK+0400, like the same code
demonstrates if you enable the attribute. Notice (direct quote): "if
tm_gmtoff or tm_zone are not available" above.

> If you want to parse the string "Tue, 28 Oct 2013 14:27:54 +0000" and
> convert it to Moscow time, here is how you do it using the datetime module:
>
>>>> import os; os.environ['TZ'] = 'Europe/Moscow'
>>>> from datetime import datetime
>>>> d = datetime.strptime("Tue, 28 Oct 2013 14:27:54 +0000", "%a, %d %b %Y
> %H:%M:%S %z")
>>>> d.astimezone().strftime("%F %T %Z%z")
> '2013-10-28 18:27:54 MSK+0400'
>
> Does this code behave differently on your system?  If it does - please file
> a bug report.

My mistake, I should have made it even more clear that the example
illustrates the results of the code from stdlib immediately above it and
therefore the tm_gmtoff, tm_zone access is disabled. Try your code
making sure that tm_zone is not used.

>>
>> 'UTC+04:00' instead of 'MSK' is not a major issue. I don't consider it a
>> bug because without access to the tz database stdlib can't do much
>> better, there always be cases when it breaks.
>
>
> It is quite possible that that such cases exist, but you have not
> demonstrated one.
>
>>
>> I just use pytz instead which does provide access to the tz database.
>
>
> This will always be your option as it is your option to use just the
> datetime module.  In both cases you can write correct code if you follow
> the reference manual or buggy code if you don't.  An almost sure way to
> write buggy code is to use one library manual to write code using another.

No. You can't write the correct code that works with timezones using only
stdlib e.g., %Z support http://bugs.python.org/issue22377

  >>> from datetime import datetime
  >>> datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
  Traceback (most recent call last):
  ...
  ValueError: ...

dateutil allows to disambiguate the timezone abbreviation and returns an
aware datetime in this case:

  >>> dateutil.parser.parse('2016-12-04 08:00:00 EST', tzinfos={'EST':-18000})
  datetime.datetime(2016, 12, 4, 8, 0, tzinfo=tzoffset('EST', -18000))

---

'UTC+04:00+0400' is not a bug like it is not a bug that a 8-bit Windows
codepage can't encode all Unicode characters -- it can't and you don't
expect it -- you just use the encoding such as utf-8 that does support
the whole Unicode range.  I don't expect datetime code that uses
time.timezone, time.tzname internally (as the excerpt from datetime.py
above demonstrates) to do timezone conversions without issues.

Again, the purpose of the example is to demonstrate the *fundamental*
deficiency in datetime module that can't be fixed without access to the
tz database (tm_gmtoff is a way to get such access for a local timezone).

>>
>>
>> [1]
> https://github.com/python/cpython/blob/fced0e12fc510e4a6158628695774ccfd02395d3/Lib/datetime.py#L1513-L1522
>> [2]
> https://github.com/python/cpython/blob/fced0e12fc510e4a6158628695774ccfd02395d3/Modules/_datetimemodule.c#L4721-L4735
>> [3] http://bugs.python.org/issue1647654
>> [4] http://bugs.python.org/issue22752
>> [5] http://bugs.python.org/issue22798


More information about the Datetime-SIG mailing list