From paul at ganssle.io Fri Jan 19 12:53:07 2018 From: paul at ganssle.io (Paul G) Date: Fri, 19 Jan 2018 12:53:07 -0500 Subject: [Datetime-SIG] Zoneinfo parser in Python 3.8 Message-ID: So I've been thinking over the idea of shipping the IANA zones as part of a Python "batteries included" type approach - which is something I've been negative about in the past because I think that it's really not a good idea to tie the zoneinfo data releases to any sort of binary or library release (in fact, I'm planning on moving dateutil's zoneinfo tarball into a separate package as of python-dateutil>=2.8.0, so that they can be updated out of cycle from one another). That said, it's still somewhat glaring in its absence. Thinking about it, though, I think it's not at all unreasonable to include a zoneinfo *parser* into Python's `datetime` library, something similar to `dateutil.tz.tzfile`, where you can give it a zic-compiled binary and it will create a zoneinfo file from that. Additionally, `datetime` can do what `pytz` and `dateutil` *already* do, which is to use the system `zoneinfo` files *if they exist*. This would allow third party libraries shipping *only* the `tzdata` to supply the tzdata on platforms that down't ship with their own copies of the database. Here's a rough sketch of the interface I'm thinking about (specific names don't matter, just the general concept): 1. datetime.TZPATH - tzdata search path (configurable at build time, defaults to `['/usr/share/zoneinfo']`) similar to `sys.path` 2. PYTHONTZPATH - environment variable similar to PYTHONPATH - this is prepended to the search path when importing `datetime` 3. datetime.tzfile(tzname, tzpath=None) - Searches the `tzpath` for `tzname`, if `tzpath` is not None, it should be list of locations to search for `tzname` in. 4. datetime.tzfile.from_stream(tzstream, name=None) - Create a tzfile from an arbitrary stream `tzstream`, with optional zone name `name` (this allows you to read something from a tarball for example) For Windows and other platforms that do *not* ship with zoneinfo, third party libraries could provide the zoneinfo data either by manipulating `datetime.TZPATH` or by wrapping either `datetime.tzfile` or `datetime.tzfile.from_stream`. I'd be happy to put together a PEP on this if people like the idea. Best, Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Fri Jan 19 14:16:48 2018 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 19 Jan 2018 14:16:48 -0500 Subject: [Datetime-SIG] Zoneinfo parser in Python 3.8 In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 12:53 PM, Paul G wrote: > That said, it's still somewhat glaring in its absence. Thinking about it, though, I think it's not at all unreasonable to include a zoneinfo *parser* into Python's `datetime` library, something similar to `dateutil.tz.tzfile` A variant of tzfile is in fact included, but buried in the datetime tester. Unfortunately, binary tz files distributed with many UNIX-like OSes do not preserve enough information to implement a high quality Python tzinfo subclass. The missing information is the breakdown of utcoffset into standard offset and dst. The tz files only store DST as a boolean and in the rare cases when DST differs from 1 hour, it is impossible to accurately figure out the .dst() timedelta. If we go to the trouble of implementing a Zoneinfo parser, I suggest that we also include a parser for the raw IANA files that can extract the DST timedelta information from the "SAVE" field. Note that I started toying with some related ideas at . From paul at ganssle.io Fri Jan 19 14:56:18 2018 From: paul at ganssle.io (Paul G) Date: Fri, 19 Jan 2018 14:56:18 -0500 Subject: [Datetime-SIG] Zoneinfo parser in Python 3.8 In-Reply-To: References: Message-ID: <6f833cb8-6205-d20a-50f7-db08bc030a1f@ganssle.io> Yeah, I'm aware of this, but I think the reality on the ground is that everyone everywhere deploys compiled zic files and will for the foreseeable future, it is the standard currency of time zone information, and we'll probably want to be able to at least fall back to this. I think we'd be better off lobbying for the addition of `SAVE` information into `zic` outputs than we would going out of our way to support other formats, but I think that at the very least we should be able to handle the compiled zic format so that it will be easy for people to access their system time zones. Currently the heuristic methods of determining DST offset are sufficient to cover all known cases - though obviously this is potentially fragile. Still, I think that users would *much* prefer a zoneinfo parser that is guaranteed to get the offset right 100% of the time and occasionally will incorrectly identify whether or not a zone is DST in the rare but possible event that a zone makes an STD->DST transition immediately followed by a DST->STD transition without any change in overall offset. Since the compiled zic format has a magic identifier at the beginning, it may be a fairly simple matter to have the parser support more than one type of zone information as inputs. One property that is also compiled from the IANA sources is the iCalendar format (this can be compiled with vzic https://github.com/libical/vzic for example). We could simply detect what kind of file it is and parse accordingly. That said, the implementation of a properly compliant iCalendar timezone parser is orders of magnitude more complicated than a simple zoneinfo parser. I don't love the idea of developing some homegrown Python format just to preserve the SAVE information. That will create a pretty significant indefinitely ongoing support burden compared to a zoneinfo parser that works with files already deployed by the OS. Best, Paul On 01/19/2018 02:16 PM, Alexander Belopolsky wrote: > On Fri, Jan 19, 2018 at 12:53 PM, Paul G wrote: > >> That said, it's still somewhat glaring in its absence. Thinking about it, though, I think it's not at all unreasonable to include a zoneinfo *parser* into Python's `datetime` library, something similar to `dateutil.tz.tzfile` > > A variant of tzfile is in fact included, but buried in the datetime > tester. Unfortunately, binary tz files distributed with many > UNIX-like OSes do not preserve enough information to implement a high > quality Python tzinfo subclass. The missing information is the > breakdown of utcoffset into standard offset and dst. The tz files > only store DST as a boolean and in the rare cases when DST differs > from 1 hour, it is impossible to accurately figure out the .dst() > timedelta. > > If we go to the trouble of implementing a Zoneinfo parser, I suggest > that we also include a parser for the raw IANA files that can extract > the DST timedelta information from the "SAVE" field. > > Note that I started toying with some related ideas at > . > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: