[Python-Dev] Status on PEP-431 Timezones

Lennart Regebro regebro at gmail.com
Wed Apr 8 17:18:15 CEST 2015


Hi!

I wrote PEP-431 two years ago, and never got around to implement it.
This year I got some renewed motivation after Berker Peksağ made an
effort of implementing it.
I'm planning to work more on this during the PyCon sprints, and also
have a BoF session or similar during the conference.

Anyone interested in a session on this, mail me and we'll set up a
time and place!


//Lennart

------------------


If anyone is interested in the details of the problem, this is it.

The big problem is the ambiguous times, like 02:30 a time when you
move the clock back one hour, as there are two different 02:30's that
day. I wrote down my experiences with looking into and trying to
implement several different solutions. And the problem there is
actually how to tell the datetime if it is before or after the
changeover.


== How others have solved it ==

=== dateutil.tz: Ignore the problem ===

dateutil.tz simply ignores the problems with ambiguous datetimes, keeping them
ambiguous.


=== pytz: One timezone instance per changeover ===

Pytz implements ambiguous datetimes by having one class per timezone. Each
change in the UTC offset changes, either because of a DST changeover, or because
the timezone changes, is represented as one instance of the class.

All instances are held in a list which is a class attribute of the timezone
class. You flag in which DST changeover you are by uising different instances
as the datetimes tzinfo. Since the timezone this way knows if it is DST or not,
the datetime as a whole knows if it's DST or not.

Benefits:
- Only known possible implementation without modifying stdlib, which of course
  was a requirement, as pytz is a third-party library.
- DST offset can be quickly returned, as it does not need to be calculated.
Drawbacks:
- A complex and highly magical implementation of timezones that is hard to
  understand.
- Required new normalize()/localize() functions on the timezone, and hence
  the API is not stdlib's API.
- Hundreds of instances per timezone means slightly more memory usage.


== Options for PEP 431 ==

=== Stdlib option 0: Ignore it ===

I don't think this is an option, really. Listed for completness.


=== Stdlib option 1: One timezone instance per changeover ===

Option 1 is to do it like pytz, have one timezone instance per changeover.
However, this is likely not possible to do without fundamentally changing the
datetime API, or making it very hard to use.

For example, when creating a datetime instance and passing in a tzinfo today
this tzinfo is just attached to the datetime. But when having multiple
instances of tzinfos this means you have to select the correct one to pass in.
pytz solves this with the .localize() method, which let's the timezone
class choose which instance to pass in.

We can't pass in the timezone class into datetime(), because that would
require datetime.__new__ to create new datetimes as a part of the timezone
arithmetic. These in turn, would create new datetimes in __new__ as a part of
the timezone arithmetic, which in turn, yeah, you get it...

I haven't been able to solve that issue without either changing the API/usage,
or getting infinite recursions.

Benefits:
- Proven soloution through pytz.
- Fast dst() call.
Drawbacks:
- Trying to use this technique with the current API tends to create
  infinite recursions. It seems to require big API changes.
- Slow datetime() instance creation.


=== Stdlib option 2: A datetime _is_dst flag ===

By having a flag on the datetime instance that says "this is in DST or not"
the timezone implementation can be kept simpler.

You also have to either calculate if the datetime is in a DST or not either
when creating it, which demands datetime object creations, and causes infinite
recursions, or you have to calculate it when needed, which means you can
get "Ambiguous date time errors" at unexpected times later.

Also, when trying to implement this, I get bogged down in the complexities
of how tzinfo and datetime is calling each other back and forth, and when
to pass in the current is_dst and when to pass in the the desired is_dst, etc.
The API and current implementation is not designed with this case in mind,
and it gets very tricky.

Benefits:
- Simpler tzinfo() implementations.
Drawbacks:
- It seems likely that we must change some API's.
- This in turn may affect the pytz implementation. Or not, hard to say.
- The DST offset must use slow timezone calculations. However, since datetimes
  are immutable it can be a cached, lazy, one-time operation.


=== Stdlib option 3: UTC internal representation ===

Having UTC as the internal representation makes the whole issue go away.
Datetimes are no longer ambiguous, except when creating, so checks need to
be done during creation, but that should be possible without datetime creation
in this case, resolving the infinite recursion problem.

Benefits:
- Problem solved.
- Minimal API changes.
Drawbacks:
- Backwards compatibility with pickles.
- Possible other backwards incompatibility problems.
- Both DST offset and date time display representation must use slow timezone
  calculations. However, since datetimes are immutable it can be a cached,
  lazy, one-time operation.



I'm currently trying to implement solution #2 above. Feedback is welcome.


More information about the Python-Dev mailing list