[Python-Dev] iso8601 parsing

Paul G paul at ganssle.io
Thu Dec 7 17:52:23 EST 2017


> And I'm sorry, I got a bit lost in the PR, but you are attaching an
> "offset" tzinfo, when parsing an iso string that has one, yes?

Yes, a fixed offset time zone (since the original zone information is lost):

    >>> from dateutil import tz
    >>> from datetime import datetime
    >>> datetime(2014, 12, 11, 9, 30, tzinfo=tz.gettz('US/Eastern'))
    datetime.datetime(2014, 12, 11, 9, 30, tzinfo=tzfile('/usr/share/zoneinfo/US/Eastern'))
    >>> datetime(2014, 12, 11, 9, 30, tzinfo=tz.gettz('US/Eastern')).isoformat()
    '2014-12-11T09:30:00-05:00'
    >>> datetime.fromisoformat('2014-12-11T09:30:00-05:00')
    datetime.datetime(2014, 12, 11, 9, 30, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400)))

> I fully agree that that's the MVP -- but is it that hard to parse arbitrary
> ISO8601 strings in once you've gotten this far? It's a bit uglier than I'd
> like, but not THAT bad a spec.

No, and in fact this PR is adapted from a *more general* ISO-8601 parser that I wrote (which is now merged into master on python-dateutil). In the CPython PR I deliberately limited it to be the inverse of `isoformat()` for two major reasons:

1. It allows us to get something out there that everyone can agree on - not only would we have to agree on whether to support arcane ISO8601 formats like YYYY-Www-D, but we also have to then discuss whether we want to be strict and disallow YYYYMM like ISO-8601 does, do we want fractional minute support? What about different variations (we're already supporting replacing T with any character in `.isoformat()` and outputting time zones in the form hh:mm:ss, so what other non-compliant variations do we want to add... and then maintain? We can have these discussions later if we want, but we might as well start with the part everyone can agree on - if it comes out of `isoformat()` it should be able to go back in througuh `fromisoformat()`.

2. It makes it *much* easier to understand what formats are supported. You can say, "This function is for reading in dates serialized with `.isoformat()`", you *immediately* know how to create compliant dates. Not to mention, the specific of formats emitted by `isoformat()` can be written very cleanly as: YYYY-MM-DD[*[HH[:MM[:SS[.mmm[mmm]]]]][+HH:MM]] (where * means any character). ISO 8601 supports YYYY-MM-DD and YYYYMMDD but not YYYY-MMDD or YYYYMM-DD

So, basically, it's not that it's amazingly hard to write a fully-featured ISO-8601, it's more that it doesn't seem like a great match for the problem this is intended to solve at this point.

Best,
Paul

On 12/07/2017 08:12 PM, Chris Barker wrote:
> 
>> Here is the PR I've submitted:
>>
>> https://github.com/python/cpython/pull/4699
>>
>> The contract that I'm supporting (and, I think it can be argued, the only
>> reasonable contract in the intial implementation) is the following:
>>
>>     dtstr = dt.isoformat(*args, **kwargs)
>>     dt_rt = datetime.fromisoformat(dtstr)
>>     assert dt_rt == dt                    # The two points represent the
>> same absolute time
>>     assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None)   # And
>> the same wall time
>>
> 
> 
> that looks good.
> 

> I see this in the comments in the PR:
> 
> 
> """
> This does not support parsing arbitrary ISO 8601 strings - it is only
> intended
> as the inverse operation of :meth:`datetime.isoformat`
> """
> 

> 
> what ISO8601 compatible features are not supported?
> 
> -CHB
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171207/613522c9/attachment.sig>


More information about the Python-Dev mailing list