From paul at ganssle.io Thu Oct 19 10:06:41 2017 From: paul at ganssle.io (Paul G) Date: Thu, 19 Oct 2017 10:06:41 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime Message-ID: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> There is a new issue about the %z directive in strptime on the issue tracker: https://bugs.python.org/issue31800 (linked to a few related issues), and a linked PR expanding the definition of %z to match HH:MM: https://github.com/python/cpython/pull/4015 I think either adding a %:z directive or expanding the definition of %z would be pretty important, and I think there's a good case to be made for either one. To summarize the arguments for people on the mailing list: The argument for expanding the definition of %z that I find strongest is that according to the linux man pages ( http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z generates +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're following those linux pages, we should be accepting the version with the colon. The argument that I find most compelling for adding a %:z directive are: 1. maintains the symmetry between strftime and strptime 2. allows users to be stricter about their datetime format 3. has precedent in that GNU's `date` command accepts %z, %:z and %::z formats Can we establish some consensus on which should be done so that it can be implemented? Best, Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From random832 at fastmail.com Thu Oct 19 10:23:04 2017 From: random832 at fastmail.com (Random832) Date: Thu, 19 Oct 2017 10:23:04 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> Message-ID: <1508422984.2404304.1144229264.47671CCD@webmail.messagingengine.com> On Thu, Oct 19, 2017, at 10:06, Paul G wrote: > The argument for expanding the definition of %z that I find strongest is > that according to the linux man pages ( > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z > generates +-HHMM in strftime, strptime is supposed to match "An > RFC-822/ISO 8601 standard timezone specification",and ISO 8601 uses > +-HH:MM, so if we're following those linux pages, we should be accepting > the version with the colon. For whatever it's worth glibc strptime on linux does *not* in fact accept +HH:MM, and if it is passed, it silently interprets, say, -05:30 as -05 (and :30 remains in the string for subsequent directives to consume). Testing with an offset with zero minutes at the end of the string does not account for this, which may be why some people in the bug comments reported that it did support it. > The argument that I find most compelling for adding a %:z directive are: > > 1. maintains the symmetry between strftime and strptime > 2. allows users to be stricter about their datetime format > 3. has precedent in that GNU's `date` command accepts %z, %:z and > %::z formats Just to be clear, date accepts them on input through date -d (which does not use strptime or posix getdate, but its own internal parse_datetime function) > Can we establish some consensus on which should be done so that it can be > implemented? I do think it should be done, but if so it may be reasonable to talk about implementing a portable version of time.strptime that will also implement this feature. From paul at ganssle.io Thu Oct 19 10:37:26 2017 From: paul at ganssle.io (Paul G) Date: Thu, 19 Oct 2017 10:37:26 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: <1508422984.2404304.1144229264.47671CCD@webmail.messagingengine.com> References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <1508422984.2404304.1144229264.47671CCD@webmail.messagingengine.com> Message-ID: <29c615db-714e-ed70-70c5-2f14a6d3592d@ganssle.io> > For whatever it's worth glibc strptime on linux does *not* in fact > accept +HH:MM, and if it is passed, it silently interprets, say, -05:30 > as -05 (and :30 remains in the string for subsequent directives to > consume). Testing with an offset with zero minutes at the end of the > string does not account for this, which may be why some people in the > bug comments reported that it did support it. Interesting. I was mostly suggesting it would be supported based on the man page, which specifies an RFC-822/ISO 8601 standard timezone specification. Since ISO-8601 includes both HHMM and HH:MM (and indeed that what is generated by .isoformat()), based on their man page it would seem they intend to support this. This is either a bug in glibc, a bug in their documentation, or I'm misinterpreting the "slash" to mean "the intersection of timezone offset specifiers laid out in RFC-822 and ISO-8601" rather than "the union of timezone offset specifiers laid out in RFC-822 and ISO-8601". Might be worth opening a bug on glibc to clarify. > Just to be clear, date accepts them on input through date -d (which does > not use strptime or posix getdate, but its own internal parse_datetime > function) Yes, no matter how it's implemented, I was just suggesting that %:z is not an extension plucked out of thin air (though I did not know about the `date` behavior and %:z *was* my immediate suggestion for an extension, so at the very least it doesn't violate principle of least surprise), but rather has some precedent in widely used datetime software. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From orent at hishome.net Thu Oct 19 16:07:36 2017 From: orent at hishome.net (Oren Tirosh) Date: Thu, 19 Oct 2017 20:07:36 +0000 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> Message-ID: https://github.com/orent/cpython/tree/strptime_extensions %:z - matches +HH:MM %?:z - optional %:z %.f - equivalent to .%f %?.f - optional %.f %?t - matches ' ' or 'T' What they all have in common is that together they make it possible to write a strptime format that matches all possible output variations of datetime.__str__/ datetime.isoformat. The time zone not only supports the : separator but also allows making the entire component optional, as isoformat() will add it only for aware datetime objects. The seconds fraction is dropped from the default string representation if the datetime represents a whole second. Since it is dropped along with the decimal dot, I first made "%.f" that includes the dot and then created the optional variant. Finally, "%?t" can be used to accept a timestamp with either of the separators defined in iso8601. It is quite absurd that datetime cannot parse its own string representation. Using these extensions an .isoparse() method may be added that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full round-tripping of all possible datetime values that do not not use a custom tzinfo. Oren On Thu, 19 Oct 2017 at 17:06, Paul G wrote: > > There is a new issue about the %z directive in strptime on the issue tracker: https://bugs.python.org/issue31800 (linked to a few related issues), and a linked PR expanding the definition of %z to match HH:MM: https://github.com/python/cpython/pull/4015 > > I think either adding a %:z directive or expanding the definition of %z would be pretty important, and I think there's a good case to be made for either one. To summarize the arguments for people on the mailing list: > > The argument for expanding the definition of %z that I find strongest is that according to the linux man pages ( http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z generates +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're following those linux pages, we should be accepting the version with the colon. > > The argument that I find most compelling for adding a %:z directive are: > > 1. maintains the symmetry between strftime and strptime > 2. allows users to be stricter about their datetime format > 3. has precedent in that GNU's `date` command accepts %z, %:z and %::z formats > > Can we establish some consensus on which should be done so that it can be implemented? > > Best, > > Paul > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariocj89 at gmail.com Thu Oct 19 18:12:45 2017 From: mariocj89 at gmail.com (Mario Corchero) Date: Thu, 19 Oct 2017 23:12:45 +0100 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> Message-ID: I think glibc does accept +HH:MM since the patch in 2015 by Vincent Bernat: e952e1df It basically added the following lines: + if (*rp == ':' && n == 2 && isdigit (*(rp + 1))) + ++rp; Which effectively just skips the ":". See http://code.metager.de/source/xref/gnu/glibc/time/strptime_l.c#765 (you can also download the source from http://ftp.gnu.org/gnu/glibc to see the commits) Same about accepting 'Z' in 900f33e2. Same person submitting the patch. On 19 October 2017 at 21:07, Oren Tirosh wrote: > https://github.com/orent/cpython/tree/strptime_extensions > > %:z - matches +HH:MM > %?:z - optional %:z > %.f - equivalent to .%f > %?.f - optional %.f > %?t - matches ' ' or 'T' > > What they all have in common is that together they make it possible to > write a strptime format that matches all possible output variations of > datetime.__str__/ datetime.isoformat. > > The time zone not only supports the : separator but also allows making the > entire component optional, as isoformat() will add it only for aware > datetime objects. The seconds fraction is dropped from the default string > representation if the datetime represents a whole second. Since it is > dropped along with the decimal dot, I first made "%.f" that includes the > dot and then created the optional variant. Finally, "%?t" can be used to > accept a timestamp with either of the separators defined in iso8601. > > It is quite absurd that datetime cannot parse its own string > representation. Using these extensions an .isoparse() method may be added > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full > round-tripping of all possible datetime values that do not not use a custom > tzinfo. > > Oren > > > > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: > > > > There is a new issue about the %z directive in strptime on the issue > tracker: https://bugs.python.org/issue31800 (linked to a few related > issues), and a linked PR expanding the definition of %z to match HH:MM: > https://github.com/python/cpython/pull/4015 > > > > I think either adding a %:z directive or expanding the definition of %z > would be pretty important, and I think there's a good case to be made for > either one. To summarize the arguments for people on the mailing list: > > > > The argument for expanding the definition of %z that I find strongest is > that according to the linux man pages ( http://man7.org/linux/man- > pages/man3/strptime.3.html ), while %z generates +-HHMM in strftime, > strptime is supposed to match "An RFC-822/ISO 8601 standard timezone > specification",and ISO 8601 uses +-HH:MM, so if we're following those linux > pages, we should be accepting the version with the colon. > > > > The argument that I find most compelling for adding a %:z directive are: > > > > 1. maintains the symmetry between strftime and strptime > > 2. allows users to be stricter about their datetime format > > 3. has precedent in that GNU's `date` command accepts %z, %:z and > %::z formats > > > > Can we establish some consensus on which should be done so that it can > be implemented? > > > > Best, > > > > Paul > > > > _______________________________________________ > > Datetime-SIG mailing list > > Datetime-SIG at python.org > > https://mail.python.org/mailman/listinfo/datetime-sig > > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Fri Oct 20 10:16:26 2017 From: paul at ganssle.io (Paul G) Date: Fri, 20 Oct 2017 10:16:26 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> Message-ID: <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> I think this would be a much bigger change to the strptime interface than is actually warranted, and probably would add in additional, unnecessary complexity by introducing the concept of optional matches. Adding the capability to match HH:MM offsets is a reasonable extension partially because that is a standard representation that is currently *not* covered by strptime, and the fact that that's how isoformat() represents the offset just makes this lack all the more acute. I think it should be uncontroversial to add *one* of these two %z extensions to Python 3 without getting bogged down in allowing a single strptime string to match any output from `.isoformat`. That said, I'm also very much in favor of a `.isoparse` or `.fromisoformat` constructor that *is* the inverse of `isoformat`, which should solve the issue without sweeping changes to how `strptime` works. On 10/19/2017 04:07 PM, Oren Tirosh wrote: > https://github.com/orent/cpython/tree/strptime_extensions > > %:z - matches +HH:MM > %?:z - optional %:z > %.f - equivalent to .%f > %?.f - optional %.f > %?t - matches ' ' or 'T' > > What they all have in common is that together they make it possible to > write a strptime format that matches all possible output variations of > datetime.__str__/ datetime.isoformat. > > The time zone not only supports the : separator but also allows making the > entire component optional, as isoformat() will add it only for aware > datetime objects. The seconds fraction is dropped from the default string > representation if the datetime represents a whole second. Since it is > dropped along with the decimal dot, I first made "%.f" that includes the > dot and then created the optional variant. Finally, "%?t" can be used to > accept a timestamp with either of the separators defined in iso8601. > > It is quite absurd that datetime cannot parse its own string > representation. Using these extensions an .isoparse() method may be added > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full > round-tripping of all possible datetime values that do not not use a custom > tzinfo. > > Oren > > > > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >> >> There is a new issue about the %z directive in strptime on the issue > tracker: https://bugs.python.org/issue31800 (linked to a few related > issues), and a linked PR expanding the definition of %z to match HH:MM: > https://github.com/python/cpython/pull/4015 >> >> I think either adding a %:z directive or expanding the definition of %z > would be pretty important, and I think there's a good case to be made for > either one. To summarize the arguments for people on the mailing list: >> >> The argument for expanding the definition of %z that I find strongest is > that according to the linux man pages ( > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z generates > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 > standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're > following those linux pages, we should be accepting the version with the > colon. >> >> The argument that I find most compelling for adding a %:z directive are: >> >> 1. maintains the symmetry between strftime and strptime >> 2. allows users to be stricter about their datetime format >> 3. has precedent in that GNU's `date` command accepts %z, %:z and > %::z formats >> >> Can we establish some consensus on which should be done so that it can be > implemented? >> >> Best, >> >> Paul >> >> _______________________________________________ >> Datetime-SIG mailing list >> Datetime-SIG at python.org >> https://mail.python.org/mailman/listinfo/datetime-sig >> The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From orent at hishome.net Sat Oct 21 02:34:53 2017 From: orent at hishome.net (Oren Tirosh) Date: Sat, 21 Oct 2017 06:34:53 +0000 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: ok, let's try to separate the issues and choices on each one: 1. Extending strptime to support time zone offset with : separator: Should a single directive accepts either hhmm or by:mm or use two separate directives? 2. Round tripping of isoformat() back to datetime value: Implement custom isoparse() function or extend strptime so isoparse simply calls strptime with a default format? Support all variations produced by isoformat or just a subset? (Variations include with/without fraction, with/without tz and separator choice) I suggest 1 separate directives 2a extend strptime and 2b support all variations. Do you have different preferences on any of these questions? I understand that the number of extensions to support this seems excessive to you. Technically, my proposed "%.f" is not really necessary. I added it for completeness. We can keep using ".%f" for non-optional fraction and define "%?f" to implicitly include the dot. The distinction between "%z", "%:z" and "%?:z"" can also be narrowed down. This can be done, for example, by making "%z" and "%?s" always accept hhmm with or without the : separator. On Fri, 20 Oct 2017 at 17:16, Paul G wrote: > I think this would be a much bigger change to the strptime interface than > is actually warranted, and probably would add in additional, unnecessary > complexity by introducing the concept of optional matches. Adding the > capability to match HH:MM offsets is a reasonable extension partially > because that is a standard representation that is currently *not* covered > by strptime, and the fact that that's how isoformat() represents the offset > just makes this lack all the more acute. > > I think it should be uncontroversial to add *one* of these two %z > extensions to Python 3 without getting bogged down in allowing a single > strptime string to match any output from `.isoformat`. > > That said, I'm also very much in favor of a `.isoparse` or > `.fromisoformat` constructor that *is* the inverse of `isoformat`, which > should solve the issue without sweeping changes to how `strptime` works. > > On 10/19/2017 04:07 PM, Oren Tirosh wrote: > > https://github.com/orent/cpython/tree/strptime_extensions > > > > %:z - matches +HH:MM > > %?:z - optional %:z > > %.f - equivalent to .%f > > %?.f - optional %.f > > %?t - matches ' ' or 'T' > > > > What they all have in common is that together they make it possible to > > write a strptime format that matches all possible output variations of > > datetime.__str__/ datetime.isoformat. > > > > The time zone not only supports the : separator but also allows making > the > > entire component optional, as isoformat() will add it only for aware > > datetime objects. The seconds fraction is dropped from the default string > > representation if the datetime represents a whole second. Since it is > > dropped along with the decimal dot, I first made "%.f" that includes the > > dot and then created the optional variant. Finally, "%?t" can be used to > > accept a timestamp with either of the separators defined in iso8601. > > > > It is quite absurd that datetime cannot parse its own string > > representation. Using these extensions an .isoparse() method may be added > > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full > > round-tripping of all possible datetime values that do not not use a > custom > > tzinfo. > > > > Oren > > > > > > > > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: > >> > >> There is a new issue about the %z directive in strptime on the issue > > tracker: https://bugs.python.org/issue31800 (linked to a few related > > issues), and a linked PR expanding the definition of %z to match HH:MM: > > https://github.com/python/cpython/pull/4015 > >> > >> I think either adding a %:z directive or expanding the definition of %z > > would be pretty important, and I think there's a good case to be made for > > either one. To summarize the arguments for people on the mailing list: > >> > >> The argument for expanding the definition of %z that I find strongest is > > that according to the linux man pages ( > > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z > generates > > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 > > standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're > > following those linux pages, we should be accepting the version with the > > colon. > >> > >> The argument that I find most compelling for adding a %:z directive are: > >> > >> 1. maintains the symmetry between strftime and strptime > >> 2. allows users to be stricter about their datetime format > >> 3. has precedent in that GNU's `date` command accepts %z, %:z and > > %::z formats > >> > >> Can we establish some consensus on which should be done so that it can > be > > implemented? > >> > >> Best, > >> > >> Paul > >> > >> _______________________________________________ > >> Datetime-SIG mailing list > >> Datetime-SIG at python.org > >> https://mail.python.org/mailman/listinfo/datetime-sig > >> The PSF Code of Conduct applies to this mailing list: > > https://www.python.org/psf/codeofconduct/ > > > > > > > > _______________________________________________ > > Datetime-SIG mailing list > > Datetime-SIG at python.org > > https://mail.python.org/mailman/listinfo/datetime-sig > > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariocj89 at gmail.com Sat Oct 21 06:23:43 2017 From: mariocj89 at gmail.com (Mario Corchero) Date: Sat, 21 Oct 2017 11:23:43 +0100 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: My opinion (as a user, I have no authority here whatsoever) *1) About parsing colons in offsets with strptime* I think having %z support both +-HH:MM and +-HHMM would be the best choice, as it seems the simplest for me as a user. I'd go even further, making %z support ':' and 'Z', *a la glibc*. This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh I think this gives the best experience to the strptime user. It basically makes the time-offset rfc3339 compatible. time-numoffset = ("+" / "-") time-hour ":" time-minute time-offset = "Z" / time-numoffset *2) Adding a handy function to build a datetime from a string serialized with isoformat* Absolutely agree on having an isoparse. That would be amazing, we can even build it on top of 1). *Side note:* I am not totally in favour with "%?:z" (probably because I am leaning on %z doing the parsing for both and ?z will have no place on strftime). I think this starts to add way too much complexity to just say "parse a time-offset". *Implementation:* I am happy to work with PaulG in the isoparse implementation if we decide to go with it and if he wants to get involved :) *Thanks:* Thanks for dedicating time to this, I think that even if minor this would be a killer addition to 3.7 if we manage to get it through. On 21 October 2017 at 07:34, Oren Tirosh wrote: > ok, let's try to separate the issues and choices on each one: > > 1. Extending strptime to support time zone offset with : separator: > Should a single directive accepts either hhmm or by:mm or use two separate > directives? > > 2. Round tripping of isoformat() back to datetime value: > Implement custom isoparse() function or extend strptime so isoparse simply > calls strptime with a default format? > Support all variations produced by isoformat or just a subset? (Variations > include with/without fraction, with/without tz and separator choice) > > I suggest 1 separate directives 2a extend strptime and 2b support all > variations. Do you have different preferences on any of these questions? > > I understand that the number of extensions to support this seems excessive > to you. > > Technically, my proposed "%.f" is not really necessary. I added it for > completeness. We can keep using ".%f" for non-optional fraction and define > "%?f" to implicitly include the dot. > > The distinction between "%z", "%:z" and "%?:z"" can also be narrowed > down. This can be done, for example, by making "%z" and "%?s" always accept > hhmm with or without the : separator. > > On Fri, 20 Oct 2017 at 17:16, Paul G wrote: > >> I think this would be a much bigger change to the strptime interface than >> is actually warranted, and probably would add in additional, unnecessary >> complexity by introducing the concept of optional matches. Adding the >> capability to match HH:MM offsets is a reasonable extension partially >> because that is a standard representation that is currently *not* covered >> by strptime, and the fact that that's how isoformat() represents the offset >> just makes this lack all the more acute. >> >> I think it should be uncontroversial to add *one* of these two %z >> extensions to Python 3 without getting bogged down in allowing a single >> strptime string to match any output from `.isoformat`. >> >> That said, I'm also very much in favor of a `.isoparse` or >> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >> should solve the issue without sweeping changes to how `strptime` works. >> >> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >> > https://github.com/orent/cpython/tree/strptime_extensions >> > >> > %:z - matches +HH:MM >> > %?:z - optional %:z >> > %.f - equivalent to .%f >> > %?.f - optional %.f >> > %?t - matches ' ' or 'T' >> > >> > What they all have in common is that together they make it possible to >> > write a strptime format that matches all possible output variations of >> > datetime.__str__/ datetime.isoformat. >> > >> > The time zone not only supports the : separator but also allows making >> the >> > entire component optional, as isoformat() will add it only for aware >> > datetime objects. The seconds fraction is dropped from the default >> string >> > representation if the datetime represents a whole second. Since it is >> > dropped along with the decimal dot, I first made "%.f" that includes the >> > dot and then created the optional variant. Finally, "%?t" can be used to >> > accept a timestamp with either of the separators defined in iso8601. >> > >> > It is quite absurd that datetime cannot parse its own string >> > representation. Using these extensions an .isoparse() method may be >> added >> > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >> > round-tripping of all possible datetime values that do not not use a >> custom >> > tzinfo. >> > >> > Oren >> > >> > >> > >> > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >> >> >> >> There is a new issue about the %z directive in strptime on the issue >> > tracker: https://bugs.python.org/issue31800 (linked to a few related >> > issues), and a linked PR expanding the definition of %z to match HH:MM: >> > https://github.com/python/cpython/pull/4015 >> >> >> >> I think either adding a %:z directive or expanding the definition of %z >> > would be pretty important, and I think there's a good case to be made >> for >> > either one. To summarize the arguments for people on the mailing list: >> >> >> >> The argument for expanding the definition of %z that I find strongest >> is >> > that according to the linux man pages ( >> > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >> generates >> > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 >> > standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're >> > following those linux pages, we should be accepting the version with the >> > colon. >> >> >> >> The argument that I find most compelling for adding a %:z directive >> are: >> >> >> >> 1. maintains the symmetry between strftime and strptime >> >> 2. allows users to be stricter about their datetime format >> >> 3. has precedent in that GNU's `date` command accepts %z, %:z and >> > %::z formats >> >> >> >> Can we establish some consensus on which should be done so that it can >> be >> > implemented? >> >> >> >> Best, >> >> >> >> Paul >> >> >> >> _______________________________________________ >> >> Datetime-SIG mailing list >> >> Datetime-SIG at python.org >> >> https://mail.python.org/mailman/listinfo/datetime-sig >> >> The PSF Code of Conduct applies to this mailing list: >> > https://www.python.org/psf/codeofconduct/ >> > >> > >> > >> > _______________________________________________ >> > Datetime-SIG mailing list >> > Datetime-SIG at python.org >> > https://mail.python.org/mailman/listinfo/datetime-sig >> > The PSF Code of Conduct applies to this mailing list: >> https://www.python.org/psf/codeofconduct/ >> > >> >> _______________________________________________ >> Datetime-SIG mailing list >> Datetime-SIG at python.org >> https://mail.python.org/mailman/listinfo/datetime-sig >> The PSF Code of Conduct applies to this mailing list: >> https://www.python.org/psf/codeofconduct/ >> > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orent at hishome.net Sat Oct 21 08:18:10 2017 From: orent at hishome.net (Oren Tirosh) Date: Sat, 21 Oct 2017 12:18:10 +0000 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: On Sat, 21 Oct 2017 at 13:24, Mario Corchero wrote: > My opinion (as a user, I have no authority here whatsoever) > > *1) About parsing colons in offsets with strptime* > > I think having %z support both +-HH:MM and +-HHMM would be the best > choice, as it seems the simplest for me as a user. > I'd go even further, making %z support ':' and 'Z', *a la glibc*. > This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh > That is fine for parsing, but my issue with this is symmetry with strftime. If the same extensions are also implemented for formatting (I have a prototype) then you need some way to specify whether you want a : separator or not. The %z will have to remain without colon on formatting for backward compatibility. So l agree that the parser can be safely made more liberal in what it accepts, but the formatter must be strict and specific in what it produces. I think this gives the best experience to the strptime user. It basically > makes the time-offset rfc3339 > compatible. > Yes, that's the goal. *2) Adding a handy function to build a datetime from a string serialized > with isoformat* > Absolutely agree on having an isoparse. That would be amazing, we can even > build it on top of 1). > ...and building it on top of 1 requires several extensions and variants. People here seem to be a bit taken aback by the scope of these extensions. I understand this reaction, but I maintain that most or all this complexity is necessary if you want to implement this on to of strptime rather than a custom isoparse(). *Side note:* > I am not totally in favour with "%?:z" (probably because I am leaning on > %z doing the parsing for both and ?z will have no place on strftime). > I think this starts to add way too much complexity to just say "parse a > time-offset". > Again, what is the alternative? If you want a parser that accepts the output of isoformat() for all possible datetime values (except custom tzinfo) then it needs to support a missing tz offset as indicating a naive timestamp. You can say that the real source of the asymmetry here is not with my proposal but rather in the underlying strftime/strptime: on formatting, %z yields an empty string for a naive timestamp rather that producing an error. But on parsing, it refuses to parse a timestamp with no offset. A truly symmetric implementation would have accepted it as an naive timestamp. Too late for %z because it must remain backward compatible, but perhaps %:z can be made to accept a missing offset as a naive timestamp. The user can then check for naive timestamp and reject them if they are unacceptable in that context, rather than specifying whether a missing timestamp is acceptable or not in the format string. I have no problem with either solution. > > *Implementation:* > I am happy to work with PaulG in the isoparse implementation if we decide > to go with it and if he wants to get involved :) > I have a working strptime: https://github.com/orent/cpython/tree/strptime_extensions isoparse() on top of this strptime is a trivial one-liner. Oren > > > *Thanks:* > Thanks for dedicating time to this, I think that even if minor this would > be a killer addition to 3.7 if we manage to get it through. > > On 21 October 2017 at 07:34, Oren Tirosh wrote: > >> ok, let's try to separate the issues and choices on each one: >> >> 1. Extending strptime to support time zone offset with : separator: >> Should a single directive accepts either hhmm or by:mm or use two >> separate directives? >> >> 2. Round tripping of isoformat() back to datetime value: >> Implement custom isoparse() function or extend strptime so isoparse >> simply calls strptime with a default format? >> Support all variations produced by isoformat or just a subset? >> (Variations include with/without fraction, with/without tz and separator >> choice) >> >> I suggest 1 separate directives 2a extend strptime and 2b support all >> variations. Do you have different preferences on any of these questions? >> >> I understand that the number of extensions to support this seems >> excessive to you. >> >> Technically, my proposed "%.f" is not really necessary. I added it for >> completeness. We can keep using ".%f" for non-optional fraction and define >> "%?f" to implicitly include the dot. >> >> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed >> down. This can be done, for example, by making "%z" and "%?s" always accept >> hhmm with or without the : separator. >> >> On Fri, 20 Oct 2017 at 17:16, Paul G wrote: >> >>> I think this would be a much bigger change to the strptime interface >>> than is actually warranted, and probably would add in additional, >>> unnecessary complexity by introducing the concept of optional matches. >>> Adding the capability to match HH:MM offsets is a reasonable extension >>> partially because that is a standard representation that is currently *not* >>> covered by strptime, and the fact that that's how isoformat() represents >>> the offset just makes this lack all the more acute. >>> >>> I think it should be uncontroversial to add *one* of these two %z >>> extensions to Python 3 without getting bogged down in allowing a single >>> strptime string to match any output from `.isoformat`. >>> >>> That said, I'm also very much in favor of a `.isoparse` or >>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >>> should solve the issue without sweeping changes to how `strptime` works. >>> >>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >>> > https://github.com/orent/cpython/tree/strptime_extensions >>> > >>> > %:z - matches +HH:MM >>> > %?:z - optional %:z >>> > %.f - equivalent to .%f >>> > %?.f - optional %.f >>> > %?t - matches ' ' or 'T' >>> > >>> > What they all have in common is that together they make it possible to >>> > write a strptime format that matches all possible output variations of >>> > datetime.__str__/ datetime.isoformat. >>> > >>> > The time zone not only supports the : separator but also allows making >>> the >>> > entire component optional, as isoformat() will add it only for aware >>> > datetime objects. The seconds fraction is dropped from the default >>> string >>> > representation if the datetime represents a whole second. Since it is >>> > dropped along with the decimal dot, I first made "%.f" that includes >>> the >>> > dot and then created the optional variant. Finally, "%?t" can be used >>> to >>> > accept a timestamp with either of the separators defined in iso8601. >>> > >>> > It is quite absurd that datetime cannot parse its own string >>> > representation. Using these extensions an .isoparse() method may be >>> added >>> > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >>> > round-tripping of all possible datetime values that do not not use a >>> custom >>> > tzinfo. >>> > >>> > Oren >>> > >>> > >>> > >>> > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >>> >> >>> >> There is a new issue about the %z directive in strptime on the issue >>> > tracker: https://bugs.python.org/issue31800 (linked to a few related >>> > issues), and a linked PR expanding the definition of %z to match HH:MM: >>> > https://github.com/python/cpython/pull/4015 >>> >> >>> >> I think either adding a %:z directive or expanding the definition of >>> %z >>> > would be pretty important, and I think there's a good case to be made >>> for >>> > either one. To summarize the arguments for people on the mailing list: >>> >> >>> >> The argument for expanding the definition of %z that I find strongest >>> is >>> > that according to the linux man pages ( >>> > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >>> generates >>> > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 >>> > standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're >>> > following those linux pages, we should be accepting the version with >>> the >>> > colon. >>> >> >>> >> The argument that I find most compelling for adding a %:z directive >>> are: >>> >> >>> >> 1. maintains the symmetry between strftime and strptime >>> >> 2. allows users to be stricter about their datetime format >>> >> 3. has precedent in that GNU's `date` command accepts %z, %:z and >>> > %::z formats >>> >> >>> >> Can we establish some consensus on which should be done so that it >>> can be >>> > implemented? >>> >> >>> >> Best, >>> >> >>> >> Paul >>> >> >>> >> _______________________________________________ >>> >> Datetime-SIG mailing list >>> >> Datetime-SIG at python.org >>> >> https://mail.python.org/mailman/listinfo/datetime-sig >>> >> The PSF Code of Conduct applies to this mailing list: >>> > https://www.python.org/psf/codeofconduct/ >>> > >>> > >>> > >>> > _______________________________________________ >>> > Datetime-SIG mailing list >>> > Datetime-SIG at python.org >>> > https://mail.python.org/mailman/listinfo/datetime-sig >>> > The PSF Code of Conduct applies to this mailing list: >>> https://www.python.org/psf/codeofconduct/ >>> > >>> >>> _______________________________________________ >>> Datetime-SIG mailing list >>> Datetime-SIG at python.org >>> https://mail.python.org/mailman/listinfo/datetime-sig >>> The PSF Code of Conduct applies to this mailing list: >>> https://www.python.org/psf/codeofconduct/ >>> >> >> _______________________________________________ >> Datetime-SIG mailing list >> Datetime-SIG at python.org >> https://mail.python.org/mailman/listinfo/datetime-sig >> The PSF Code of Conduct applies to this mailing list: >> https://www.python.org/psf/codeofconduct/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariocj89 at gmail.com Sat Oct 21 08:55:36 2017 From: mariocj89 at gmail.com (Mario Corchero) Date: Sat, 21 Oct 2017 13:55:36 +0100 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: On 21 October 2017 at 13:18, Oren Tirosh wrote: > > On Sat, 21 Oct 2017 at 13:24, Mario Corchero wrote: > >> My opinion (as a user, I have no authority here whatsoever) >> >> *1) About parsing colons in offsets with strptime* >> >> I think having %z support both +-HH:MM and +-HHMM would be the best >> choice, as it seems the simplest for me as a user. >> I'd go even further, making %z support ':' and 'Z', *a la glibc*. >> This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh >> > > That is fine for parsing, but my issue with this is symmetry with > strftime. If the same extensions are also implemented for formatting (I > have a prototype) then you need some way to specify whether you want a : > separator or not. The %z will have to remain without colon on formatting > for backward compatibility. > > So l agree that the parser can be safely made more liberal in what it > accepts, but the formatter must be strict and specific in what it produces. > > I think this gives the best experience to the strptime user. It basically >> makes the time-offset rfc3339 >> compatible. >> > > Yes, that's the goal. > > *2) Adding a handy function to build a datetime from a string serialized >> with isoformat* >> Absolutely agree on having an isoparse. That would be amazing, we can >> even build it on top of 1). >> > > ...and building it on top of 1 requires several extensions and variants. > People here seem to be a bit taken aback by the scope of these extensions. > I understand this reaction, but I maintain that most or all this complexity > is necessary if you want to implement this on to of strptime rather than a > custom isoparse(). > > *Side note:* >> I am not totally in favour with "%?:z" (probably because I am leaning on >> %z doing the parsing for both and ?z will have no place on strftime). >> I think this starts to add way too much complexity to just say "parse a >> time-offset". >> > > Again, what is the alternative? If you want a parser that accepts the > output of isoformat() for all possible datetime values (except custom > tzinfo) then it needs to support a missing tz offset as indicating a naive > timestamp. > > You can say that the real source of the asymmetry here is not with my > proposal but rather in the underlying strftime/strptime: on formatting, %z > yields an empty string for a naive timestamp rather that producing an > error. But on parsing, it refuses to parse a timestamp with no offset. A > truly symmetric implementation would have accepted it as an naive > timestamp. > > Too late for %z because it must remain backward compatible, but perhaps > %:z can be made to accept a missing offset as a naive timestamp. The user > can then check for naive timestamp and reject them if they are unacceptable > in that context, rather than specifying whether a missing timestamp is > acceptable or not in the format string. I have no problem with either > solution. > >> >> *Implementation:* >> I am happy to work with PaulG in the isoparse implementation if we decide >> to go with it and if he wants to get involved :) >> > > I have a working strptime: > https://github.com/orent/cpython/tree/strptime_extensions > > isoparse() on top of this strptime is a trivial one-liner. > > Oren > >> >> >> *Thanks:* >> Thanks for dedicating time to this, I think that even if minor this would >> be a killer addition to 3.7 if we manage to get it through. >> >> On 21 October 2017 at 07:34, Oren Tirosh wrote: >> >>> ok, let's try to separate the issues and choices on each one: >>> >>> 1. Extending strptime to support time zone offset with : separator: >>> Should a single directive accepts either hhmm or by:mm or use two >>> separate directives? >>> >>> 2. Round tripping of isoformat() back to datetime value: >>> Implement custom isoparse() function or extend strptime so isoparse >>> simply calls strptime with a default format? >>> Support all variations produced by isoformat or just a subset? >>> (Variations include with/without fraction, with/without tz and separator >>> choice) >>> >>> I suggest 1 separate directives 2a extend strptime and 2b support all >>> variations. Do you have different preferences on any of these questions? >>> >>> I understand that the number of extensions to support this seems >>> excessive to you. >>> >>> Technically, my proposed "%.f" is not really necessary. I added it for >>> completeness. We can keep using ".%f" for non-optional fraction and define >>> "%?f" to implicitly include the dot. >>> >>> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed >>> down. This can be done, for example, by making "%z" and "%?s" always accept >>> hhmm with or without the : separator. >>> >>> On Fri, 20 Oct 2017 at 17:16, Paul G wrote: >>> >>>> I think this would be a much bigger change to the strptime interface >>>> than is actually warranted, and probably would add in additional, >>>> unnecessary complexity by introducing the concept of optional matches. >>>> Adding the capability to match HH:MM offsets is a reasonable extension >>>> partially because that is a standard representation that is currently *not* >>>> covered by strptime, and the fact that that's how isoformat() represents >>>> the offset just makes this lack all the more acute. >>>> >>>> I think it should be uncontroversial to add *one* of these two %z >>>> extensions to Python 3 without getting bogged down in allowing a single >>>> strptime string to match any output from `.isoformat`. >>>> >>>> That said, I'm also very much in favor of a `.isoparse` or >>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >>>> should solve the issue without sweeping changes to how `strptime` works. >>>> >>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >>>> > https://github.com/orent/cpython/tree/strptime_extensions >>>> > >>>> > %:z - matches +HH:MM >>>> > %?:z - optional %:z >>>> > %.f - equivalent to .%f >>>> > %?.f - optional %.f >>>> > %?t - matches ' ' or 'T' >>>> > >>>> > What they all have in common is that together they make it possible to >>>> > write a strptime format that matches all possible output variations of >>>> > datetime.__str__/ datetime.isoformat. >>>> > >>>> > The time zone not only supports the : separator but also allows >>>> making the >>>> > entire component optional, as isoformat() will add it only for aware >>>> > datetime objects. The seconds fraction is dropped from the default >>>> string >>>> > representation if the datetime represents a whole second. Since it is >>>> > dropped along with the decimal dot, I first made "%.f" that includes >>>> the >>>> > dot and then created the optional variant. Finally, "%?t" can be used >>>> to >>>> > accept a timestamp with either of the separators defined in iso8601. >>>> > >>>> > It is quite absurd that datetime cannot parse its own string >>>> > representation. Using these extensions an .isoparse() method may be >>>> added >>>> > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >>>> > round-tripping of all possible datetime values that do not not use a >>>> custom >>>> > tzinfo. >>>> > >>>> > Oren >>>> > >>>> > >>>> > >>>> > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >>>> >> >>>> >> There is a new issue about the %z directive in strptime on the issue >>>> > tracker: https://bugs.python.org/issue31800 (linked to a few related >>>> > issues), and a linked PR expanding the definition of %z to match >>>> HH:MM: >>>> > https://github.com/python/cpython/pull/4015 >>>> >> >>>> >> I think either adding a %:z directive or expanding the definition of >>>> %z >>>> > would be pretty important, and I think there's a good case to be made >>>> for >>>> > either one. To summarize the arguments for people on the mailing list: >>>> >> >>>> >> The argument for expanding the definition of %z that I find >>>> strongest is >>>> > that according to the linux man pages ( >>>> > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >>>> generates >>>> > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 >>>> > standard timezone specification",and ISO 8601 uses +-HH:MM, so if >>>> we're >>>> > following those linux pages, we should be accepting the version with >>>> the >>>> > colon. >>>> >> >>>> >> The argument that I find most compelling for adding a %:z directive >>>> are: >>>> >> >>>> >> 1. maintains the symmetry between strftime and strptime >>>> >> 2. allows users to be stricter about their datetime format >>>> >> 3. has precedent in that GNU's `date` command accepts %z, %:z and >>>> > %::z formats >>>> >> >>>> >> Can we establish some consensus on which should be done so that it >>>> can be >>>> > implemented? >>>> >> >>>> >> Best, >>>> >> >>>> >> Paul >>>> >> >>>> >> _______________________________________________ >>>> >> Datetime-SIG mailing list >>>> >> Datetime-SIG at python.org >>>> >> https://mail.python.org/mailman/listinfo/datetime-sig >>>> >> The PSF Code of Conduct applies to this mailing list: >>>> > https://www.python.org/psf/codeofconduct/ >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Datetime-SIG mailing list >>>> > Datetime-SIG at python.org >>>> > https://mail.python.org/mailman/listinfo/datetime-sig >>>> > The PSF Code of Conduct applies to this mailing list: >>>> https://www.python.org/psf/codeofconduct/ >>>> > >>>> >>>> _______________________________________________ >>>> Datetime-SIG mailing list >>>> Datetime-SIG at python.org >>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>> The PSF Code of Conduct applies to this mailing list: >>>> https://www.python.org/psf/codeofconduct/ >>>> >>> >>> _______________________________________________ >>> Datetime-SIG mailing list >>> Datetime-SIG at python.org >>> https://mail.python.org/mailman/listinfo/datetime-sig >>> The PSF Code of Conduct applies to this mailing list: >>> https://www.python.org/psf/codeofconduct/ >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariocj89 at gmail.com Sat Oct 21 09:07:19 2017 From: mariocj89 at gmail.com (Mario Corchero) Date: Sat, 21 Oct 2017 14:07:19 +0100 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: Sorry, hit send by mistake on the previous message. That is fine for parsing, but my issue with this is symmetry with strftime. I can agree with having a %:z for support in strftime but I think that is a separate change. The issue I opened with the attached PR focused only in strptime to facilitate the discussion. Again, what is the alternative? Making %z accept time-offset rfc3339 compatible. I have a working strptime: Ouch, except for the fractionals seconds (which was not part of the issue raised) I had also a patch for the colon and another for supporting 'Z' as reported in the bug tracker. I was mentioning working with Paul in the implementation of isoparse, as even if it might look simple it has caused many long-standing discussions in the past. On 21 October 2017 at 13:55, Mario Corchero wrote: > > > On 21 October 2017 at 13:18, Oren Tirosh wrote: > >> >> On Sat, 21 Oct 2017 at 13:24, Mario Corchero wrote: >> >>> My opinion (as a user, I have no authority here whatsoever) >>> >>> *1) About parsing colons in offsets with strptime* >>> >>> I think having %z support both +-HH:MM and +-HHMM would be the best >>> choice, as it seems the simplest for me as a user. >>> I'd go even further, making %z support ':' and 'Z', *a la glibc*. >>> This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh >>> >> >> That is fine for parsing, but my issue with this is symmetry with >> strftime. If the same extensions are also implemented for formatting (I >> have a prototype) then you need some way to specify whether you want a : >> separator or not. The %z will have to remain without colon on formatting >> for backward compatibility. >> >> So l agree that the parser can be safely made more liberal in what it >> accepts, but the formatter must be strict and specific in what it produces. >> >> I think this gives the best experience to the strptime user. It >>> basically makes the time-offset rfc3339 >>> compatible. >>> >> >> Yes, that's the goal. >> >> *2) Adding a handy function to build a datetime from a string serialized >>> with isoformat* >>> Absolutely agree on having an isoparse. That would be amazing, we can >>> even build it on top of 1). >>> >> >> ...and building it on top of 1 requires several extensions and variants. >> People here seem to be a bit taken aback by the scope of these extensions. >> I understand this reaction, but I maintain that most or all this complexity >> is necessary if you want to implement this on to of strptime rather than a >> custom isoparse(). >> >> *Side note:* >>> I am not totally in favour with "%?:z" (probably because I am leaning >>> on %z doing the parsing for both and ?z will have no place on strftime). >>> I think this starts to add way too much complexity to just say "parse a >>> time-offset". >>> >> >> Again, what is the alternative? If you want a parser that accepts the >> output of isoformat() for all possible datetime values (except custom >> tzinfo) then it needs to support a missing tz offset as indicating a naive >> timestamp. >> >> You can say that the real source of the asymmetry here is not with my >> proposal but rather in the underlying strftime/strptime: on formatting, %z >> yields an empty string for a naive timestamp rather that producing an >> error. But on parsing, it refuses to parse a timestamp with no offset. A >> truly symmetric implementation would have accepted it as an naive >> timestamp. >> >> Too late for %z because it must remain backward compatible, but perhaps >> %:z can be made to accept a missing offset as a naive timestamp. The user >> can then check for naive timestamp and reject them if they are unacceptable >> in that context, rather than specifying whether a missing timestamp is >> acceptable or not in the format string. I have no problem with either >> solution. >> >>> >>> *Implementation:* >>> I am happy to work with PaulG in the isoparse implementation if we >>> decide to go with it and if he wants to get involved :) >>> >> >> I have a working strptime: >> https://github.com/orent/cpython/tree/strptime_extensions >> >> isoparse() on top of this strptime is a trivial one-liner. >> >> Oren >> >>> >>> >>> *Thanks:* >>> Thanks for dedicating time to this, I think that even if minor this >>> would be a killer addition to 3.7 if we manage to get it through. >>> >>> On 21 October 2017 at 07:34, Oren Tirosh wrote: >>> >>>> ok, let's try to separate the issues and choices on each one: >>>> >>>> 1. Extending strptime to support time zone offset with : separator: >>>> Should a single directive accepts either hhmm or by:mm or use two >>>> separate directives? >>>> >>>> 2. Round tripping of isoformat() back to datetime value: >>>> Implement custom isoparse() function or extend strptime so isoparse >>>> simply calls strptime with a default format? >>>> Support all variations produced by isoformat or just a subset? >>>> (Variations include with/without fraction, with/without tz and separator >>>> choice) >>>> >>>> I suggest 1 separate directives 2a extend strptime and 2b support all >>>> variations. Do you have different preferences on any of these questions? >>>> >>>> I understand that the number of extensions to support this seems >>>> excessive to you. >>>> >>>> Technically, my proposed "%.f" is not really necessary. I added it for >>>> completeness. We can keep using ".%f" for non-optional fraction and define >>>> "%?f" to implicitly include the dot. >>>> >>>> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed >>>> down. This can be done, for example, by making "%z" and "%?s" always accept >>>> hhmm with or without the : separator. >>>> >>>> On Fri, 20 Oct 2017 at 17:16, Paul G wrote: >>>> >>>>> I think this would be a much bigger change to the strptime interface >>>>> than is actually warranted, and probably would add in additional, >>>>> unnecessary complexity by introducing the concept of optional matches. >>>>> Adding the capability to match HH:MM offsets is a reasonable extension >>>>> partially because that is a standard representation that is currently *not* >>>>> covered by strptime, and the fact that that's how isoformat() represents >>>>> the offset just makes this lack all the more acute. >>>>> >>>>> I think it should be uncontroversial to add *one* of these two %z >>>>> extensions to Python 3 without getting bogged down in allowing a single >>>>> strptime string to match any output from `.isoformat`. >>>>> >>>>> That said, I'm also very much in favor of a `.isoparse` or >>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >>>>> should solve the issue without sweeping changes to how `strptime` works. >>>>> >>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >>>>> > https://github.com/orent/cpython/tree/strptime_extensions >>>>> > >>>>> > %:z - matches +HH:MM >>>>> > %?:z - optional %:z >>>>> > %.f - equivalent to .%f >>>>> > %?.f - optional %.f >>>>> > %?t - matches ' ' or 'T' >>>>> > >>>>> > What they all have in common is that together they make it possible >>>>> to >>>>> > write a strptime format that matches all possible output variations >>>>> of >>>>> > datetime.__str__/ datetime.isoformat. >>>>> > >>>>> > The time zone not only supports the : separator but also allows >>>>> making the >>>>> > entire component optional, as isoformat() will add it only for aware >>>>> > datetime objects. The seconds fraction is dropped from the default >>>>> string >>>>> > representation if the datetime represents a whole second. Since it is >>>>> > dropped along with the decimal dot, I first made "%.f" that includes >>>>> the >>>>> > dot and then created the optional variant. Finally, "%?t" can be >>>>> used to >>>>> > accept a timestamp with either of the separators defined in iso8601. >>>>> > >>>>> > It is quite absurd that datetime cannot parse its own string >>>>> > representation. Using these extensions an .isoparse() method may be >>>>> added >>>>> > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >>>>> > round-tripping of all possible datetime values that do not not use a >>>>> custom >>>>> > tzinfo. >>>>> > >>>>> > Oren >>>>> > >>>>> > >>>>> > >>>>> > On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >>>>> >> >>>>> >> There is a new issue about the %z directive in strptime on the issue >>>>> > tracker: https://bugs.python.org/issue31800 (linked to a few related >>>>> > issues), and a linked PR expanding the definition of %z to match >>>>> HH:MM: >>>>> > https://github.com/python/cpython/pull/4015 >>>>> >> >>>>> >> I think either adding a %:z directive or expanding the definition >>>>> of %z >>>>> > would be pretty important, and I think there's a good case to be >>>>> made for >>>>> > either one. To summarize the arguments for people on the mailing >>>>> list: >>>>> >> >>>>> >> The argument for expanding the definition of %z that I find >>>>> strongest is >>>>> > that according to the linux man pages ( >>>>> > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >>>>> generates >>>>> > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO >>>>> 8601 >>>>> > standard timezone specification",and ISO 8601 uses +-HH:MM, so if >>>>> we're >>>>> > following those linux pages, we should be accepting the version with >>>>> the >>>>> > colon. >>>>> >> >>>>> >> The argument that I find most compelling for adding a %:z directive >>>>> are: >>>>> >> >>>>> >> 1. maintains the symmetry between strftime and strptime >>>>> >> 2. allows users to be stricter about their datetime format >>>>> >> 3. has precedent in that GNU's `date` command accepts %z, %:z >>>>> and >>>>> > %::z formats >>>>> >> >>>>> >> Can we establish some consensus on which should be done so that it >>>>> can be >>>>> > implemented? >>>>> >> >>>>> >> Best, >>>>> >> >>>>> >> Paul >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Datetime-SIG mailing list >>>>> >> Datetime-SIG at python.org >>>>> >> https://mail.python.org/mailman/listinfo/datetime-sig >>>>> >> The PSF Code of Conduct applies to this mailing list: >>>>> > https://www.python.org/psf/codeofconduct/ >>>>> > >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Datetime-SIG mailing list >>>>> > Datetime-SIG at python.org >>>>> > https://mail.python.org/mailman/listinfo/datetime-sig >>>>> > The PSF Code of Conduct applies to this mailing list: >>>>> https://www.python.org/psf/codeofconduct/ >>>>> > >>>>> >>>>> _______________________________________________ >>>>> Datetime-SIG mailing list >>>>> Datetime-SIG at python.org >>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>> The PSF Code of Conduct applies to this mailing list: >>>>> https://www.python.org/psf/codeofconduct/ >>>>> >>>> >>>> _______________________________________________ >>>> Datetime-SIG mailing list >>>> Datetime-SIG at python.org >>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>> The PSF Code of Conduct applies to this mailing list: >>>> https://www.python.org/psf/codeofconduct/ >>>> >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orent at hishome.net Sat Oct 21 10:20:46 2017 From: orent at hishome.net (Oren Tirosh) Date: Sat, 21 Oct 2017 14:20:46 +0000 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: On Sat, 21 Oct 2017 at 16:08, Mario Corchero wrote: > Sorry, hit send by mistake on the previous message. > > > That is fine for parsing, but my issue with this is symmetry with strftime. > > > I can agree with having a %:z for support in strftime but I think that is > a separate change. The issue I opened with the attached PR focused only in > strptime to facilitate the discussion. > Yes, strftime is a separate issue, but still relevant as a design concern for any new changes to strptime. > My revised proposal is this: Add "%:z" with the following semantics: 1. Requires ":" separator 2. Officially matches the empty string, producing a naive datetime (tzinfo=None) 3. [maybe] officially matches "Z", equivalent to "+00:00" For "%z", retain the existing semantics, with one extension 1. Does not require ":" (but silently accepts it) 2. Does not match the empty string Here's why: [snip] Oren: >>> >> >>> You can say that the real source of the asymmetry here is not with my >>> proposal but rather in the underlying strftime/strptime: on formatting, %z >>> yields an empty string for a naive timestamp rather that producing an >>> error. But on parsing, it refuses to parse a timestamp with no offset. A >>> truly symmetric implementation would have accepted it as a naive timestamp. >>> >> >>> Too late for %z because it must remain backward compatible, but perhaps >>> %:z can be made to accept a missing offset as a naive timestamp. The user >>> can then check for naive timestamp and reject them if they are unacceptable >>> in that context, rather than specifying whether a missing timestamp is >>> acceptable or not in the format string. I have no problem with either >>> solution >>> >> [snip] >>> >> A separate proposal: Add "%.f" with the following semantics: 1. Offially matches empty string, producing a timestamp with 0 fraction. 2. Otherwise equivalent to ".%f" Retracting proposal for "%?t" for now. With these two extensions, an strptime format can be written that can parse and losslessly round-trip the output of datetime.__str__, or isoformat() with the default space separator for all possible datetime values, naive or aware, except those using custom tzinfo. While not part of the proposal, these two extensions may also be naturally applied to strftime so that the same format string used for parsing will also produce an output identical to isoformat(), including naive timestamps and whole second timestamps. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Sat Oct 21 12:01:51 2017 From: paul at ganssle.io (Paul G) Date: Sat, 21 Oct 2017 12:01:51 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: <28a4cadd-6126-a2e7-31b9-828d6956601d@ganssle.io> I think that this is a case of the perfect being the enemy of the good. Just because we're trying to touch strptime does not mean we need to make it perfect in one go. I think it's a separate discussion if we want to add features to strptime to make it closer to a domain specific language for parsing dates, but I think we should start by focusing on parsing HH:MM, and we can have a separate discussion later about other extensions. With regards to "once we have these extensions, isoparse becomes a one-liner", I don't think this needs to be a goal at all. strptime does not need to be designed such that implementing isoparse is trivial, it just needs to be designed such that isoparse is *possible*. Consider this implementation of isoparse: def isoparse(dt_str, sep='T'): base_fmt = "%Y-%m-%d" len_str = len(dt_str) if len_str > 10: base_fmt += sep if len_str == 10: tail = '' elif len_str == 13: tail = '%H' # hours, no tzinfo elif len_str == 16: tail = '%H:%M' # minutes, no tzinfo elif len_str == 19: if dt_str[-6] in '-+': tail = '%H%:z' # hours, with tzinfo else: tail = '%H:%M:%S' # seconds, no tzinfo elif len_str == 22: tail = '%H:%M%:z' # minutes, with tzinfo elif len_str in {23, 26}: tail = '%H:%M:%S.%f' # milliseconds/microseconds, no tzinfo elif len_str== 25: tail = '%H:%M:%S%:z' # seconds, with tzinfo elif len_str in {29, 32}: tail = '%H:%M:%S.%f%:z' # milliseconds/microseconds, with tzinfo else: raise ValueError('Invalid isoformat string') return datetime.datetime.strptime(dt_str, base_fmt + tail) In C this could be implemented pretty efficiently as a switch statement, and it covers all possible outputs of isoformat (there's also a way to do it such that `sep` is automatically detected, but this is stricter), and the only thing actually missing is an `strptime` can accept a '%:z' (or equivalent of the gnu version of '%z') string. The fact that it's not a one-liner is immaterial, since it's going into the standard library, so then parsing the results of `isoformat` becomes the one-liner `datetime.isoparse(dt_str)`. Here is a working proof-of-concept with some basic tests: https://gist.github.com/pganssle/930756cc93f7d888ab63363eb33d5fe5 On 10/21/2017 10:20 AM, Oren Tirosh wrote: > On Sat, 21 Oct 2017 at 16:08, Mario Corchero wrote: > >> Sorry, hit send by mistake on the previous message. >> >> >> That is fine for parsing, but my issue with this is symmetry with strftime. >> >> >> I can agree with having a %:z for support in strftime but I think that is >> a separate change. The issue I opened with the attached PR focused only in >> strptime to facilitate the discussion. >> > > Yes, strftime is a separate issue, but still relevant as a design concern > for any new changes to strptime. > >> > My revised proposal is this: > > Add "%:z" with the following semantics: > 1. Requires ":" separator > 2. Officially matches the empty string, producing a naive datetime > (tzinfo=None) > 3. [maybe] officially matches "Z", equivalent to "+00:00" > > For "%z", retain the existing semantics, with one extension > 1. Does not require ":" (but silently accepts it) > 2. Does not match the empty string > > Here's why: > > [snip] Oren: >>>> >>> >>>> You can say that the real source of the asymmetry here is not with my >>>> proposal but rather in the underlying strftime/strptime: on formatting, %z >>>> yields an empty string for a naive timestamp rather that producing an >>>> error. But on parsing, it refuses to parse a timestamp with no offset. A >>>> truly symmetric implementation would have accepted it as a naive timestamp. >>>> >>> >>>> Too late for %z because it must remain backward compatible, but perhaps >>>> %:z can be made to accept a missing offset as a naive timestamp. The user >>>> can then check for naive timestamp and reject them if they are unacceptable >>>> in that context, rather than specifying whether a missing timestamp is >>>> acceptable or not in the format string. I have no problem with either >>>> solution >>>> >>> [snip] >>>> >>> > A separate proposal: > > Add "%.f" with the following semantics: > 1. Offially matches empty string, producing a timestamp with 0 fraction. > 2. Otherwise equivalent to ".%f" > > Retracting proposal for "%?t" for now. > > With these two extensions, an strptime format can be written that can parse > and losslessly round-trip the output of datetime.__str__, or isoformat() > with the default space separator for all possible datetime values, naive or > aware, except those using custom tzinfo. > > While not part of the proposal, these two extensions may also be naturally > applied to strftime so that the same format string used for parsing will > also produce an output identical to isoformat(), including naive timestamps > and whole second timestamps. > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From paul at ganssle.io Sat Oct 21 12:12:34 2017 From: paul at ganssle.io (Paul G) Date: Sat, 21 Oct 2017 12:12:34 -0400 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: Back to the subject of how to handle +-HH:MM, I think the only really viable candidates are %z and %:z, so I think the question boils down to whether, with strptime, we care more about consistency with GNU / glibc's strptime (which apparently do implement %z to cover both HHMM and HH:MM) or whether we care more about users being able to specific *exactly* the string they want to match (e.g. allowing users to specify that a colon found in a time zone offset is an error condition). I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable. Best, Paul On 10/21/2017 09:07 AM, Mario Corchero wrote: > Sorry, hit send by mistake on the previous message. > > That is fine for parsing, but my issue with this is symmetry with strftime. > > > I can agree with having a %:z for support in strftime but I think that is a > separate change. The issue I opened with the attached PR focused only in > strptime to facilitate the discussion. > > Again, what is the alternative? > > > Making %z accept time-offset rfc3339 compatible. > > I have a working strptime: > > > Ouch, except for the fractionals seconds (which was not part of the issue > raised) I had also a patch for the colon and another for supporting 'Z' as > reported in the bug tracker. I was mentioning working with Paul in the > implementation of isoparse, as even if it might look simple it has caused > many long-standing discussions in the past. > > On 21 October 2017 at 13:55, Mario Corchero wrote: > >> >> >> On 21 October 2017 at 13:18, Oren Tirosh wrote: >> >>> >>> On Sat, 21 Oct 2017 at 13:24, Mario Corchero wrote: >>> >>>> My opinion (as a user, I have no authority here whatsoever) >>>> >>>> *1) About parsing colons in offsets with strptime* >>>> >>>> I think having %z support both +-HH:MM and +-HHMM would be the best >>>> choice, as it seems the simplest for me as a user. >>>> I'd go even further, making %z support ':' and 'Z', *a la glibc*. >>>> This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh >>>> >>> >>> That is fine for parsing, but my issue with this is symmetry with >>> strftime. If the same extensions are also implemented for formatting (I >>> have a prototype) then you need some way to specify whether you want a : >>> separator or not. The %z will have to remain without colon on formatting >>> for backward compatibility. >>> >>> So l agree that the parser can be safely made more liberal in what it >>> accepts, but the formatter must be strict and specific in what it produces. >>> >>> I think this gives the best experience to the strptime user. It >>>> basically makes the time-offset rfc3339 >>>> compatible. >>>> >>> >>> Yes, that's the goal. >>> >>> *2) Adding a handy function to build a datetime from a string serialized >>>> with isoformat* >>>> Absolutely agree on having an isoparse. That would be amazing, we can >>>> even build it on top of 1). >>>> >>> >>> ...and building it on top of 1 requires several extensions and variants. >>> People here seem to be a bit taken aback by the scope of these extensions. >>> I understand this reaction, but I maintain that most or all this complexity >>> is necessary if you want to implement this on to of strptime rather than a >>> custom isoparse(). >>> >>> *Side note:* >>>> I am not totally in favour with "%?:z" (probably because I am leaning >>>> on %z doing the parsing for both and ?z will have no place on strftime). >>>> I think this starts to add way too much complexity to just say "parse a >>>> time-offset". >>>> >>> >>> Again, what is the alternative? If you want a parser that accepts the >>> output of isoformat() for all possible datetime values (except custom >>> tzinfo) then it needs to support a missing tz offset as indicating a naive >>> timestamp. >>> >>> You can say that the real source of the asymmetry here is not with my >>> proposal but rather in the underlying strftime/strptime: on formatting, %z >>> yields an empty string for a naive timestamp rather that producing an >>> error. But on parsing, it refuses to parse a timestamp with no offset. A >>> truly symmetric implementation would have accepted it as an naive >>> timestamp. >>> >>> Too late for %z because it must remain backward compatible, but perhaps >>> %:z can be made to accept a missing offset as a naive timestamp. The user >>> can then check for naive timestamp and reject them if they are unacceptable >>> in that context, rather than specifying whether a missing timestamp is >>> acceptable or not in the format string. I have no problem with either >>> solution. >>> >>>> >>>> *Implementation:* >>>> I am happy to work with PaulG in the isoparse implementation if we >>>> decide to go with it and if he wants to get involved :) >>>> >>> >>> I have a working strptime: >>> https://github.com/orent/cpython/tree/strptime_extensions >>> >>> isoparse() on top of this strptime is a trivial one-liner. >>> >>> Oren >>> >>>> >>>> >>>> *Thanks:* >>>> Thanks for dedicating time to this, I think that even if minor this >>>> would be a killer addition to 3.7 if we manage to get it through. >>>> >>>> On 21 October 2017 at 07:34, Oren Tirosh wrote: >>>> >>>>> ok, let's try to separate the issues and choices on each one: >>>>> >>>>> 1. Extending strptime to support time zone offset with : separator: >>>>> Should a single directive accepts either hhmm or by:mm or use two >>>>> separate directives? >>>>> >>>>> 2. Round tripping of isoformat() back to datetime value: >>>>> Implement custom isoparse() function or extend strptime so isoparse >>>>> simply calls strptime with a default format? >>>>> Support all variations produced by isoformat or just a subset? >>>>> (Variations include with/without fraction, with/without tz and separator >>>>> choice) >>>>> >>>>> I suggest 1 separate directives 2a extend strptime and 2b support all >>>>> variations. Do you have different preferences on any of these questions? >>>>> >>>>> I understand that the number of extensions to support this seems >>>>> excessive to you. >>>>> >>>>> Technically, my proposed "%.f" is not really necessary. I added it for >>>>> completeness. We can keep using ".%f" for non-optional fraction and define >>>>> "%?f" to implicitly include the dot. >>>>> >>>>> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed >>>>> down. This can be done, for example, by making "%z" and "%?s" always accept >>>>> hhmm with or without the : separator. >>>>> >>>>> On Fri, 20 Oct 2017 at 17:16, Paul G wrote: >>>>> >>>>>> I think this would be a much bigger change to the strptime interface >>>>>> than is actually warranted, and probably would add in additional, >>>>>> unnecessary complexity by introducing the concept of optional matches. >>>>>> Adding the capability to match HH:MM offsets is a reasonable extension >>>>>> partially because that is a standard representation that is currently *not* >>>>>> covered by strptime, and the fact that that's how isoformat() represents >>>>>> the offset just makes this lack all the more acute. >>>>>> >>>>>> I think it should be uncontroversial to add *one* of these two %z >>>>>> extensions to Python 3 without getting bogged down in allowing a single >>>>>> strptime string to match any output from `.isoformat`. >>>>>> >>>>>> That said, I'm also very much in favor of a `.isoparse` or >>>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >>>>>> should solve the issue without sweeping changes to how `strptime` works. >>>>>> >>>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >>>>>>> https://github.com/orent/cpython/tree/strptime_extensions >>>>>>> >>>>>>> %:z - matches +HH:MM >>>>>>> %?:z - optional %:z >>>>>>> %.f - equivalent to .%f >>>>>>> %?.f - optional %.f >>>>>>> %?t - matches ' ' or 'T' >>>>>>> >>>>>>> What they all have in common is that together they make it possible >>>>>> to >>>>>>> write a strptime format that matches all possible output variations >>>>>> of >>>>>>> datetime.__str__/ datetime.isoformat. >>>>>>> >>>>>>> The time zone not only supports the : separator but also allows >>>>>> making the >>>>>>> entire component optional, as isoformat() will add it only for aware >>>>>>> datetime objects. The seconds fraction is dropped from the default >>>>>> string >>>>>>> representation if the datetime represents a whole second. Since it is >>>>>>> dropped along with the decimal dot, I first made "%.f" that includes >>>>>> the >>>>>>> dot and then created the optional variant. Finally, "%?t" can be >>>>>> used to >>>>>>> accept a timestamp with either of the separators defined in iso8601. >>>>>>> >>>>>>> It is quite absurd that datetime cannot parse its own string >>>>>>> representation. Using these extensions an .isoparse() method may be >>>>>> added >>>>>>> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >>>>>>> round-tripping of all possible datetime values that do not not use a >>>>>> custom >>>>>>> tzinfo. >>>>>>> >>>>>>> Oren >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, 19 Oct 2017 at 17:06, Paul G wrote: >>>>>>>> >>>>>>>> There is a new issue about the %z directive in strptime on the issue >>>>>>> tracker: https://bugs.python.org/issue31800 (linked to a few related >>>>>>> issues), and a linked PR expanding the definition of %z to match >>>>>> HH:MM: >>>>>>> https://github.com/python/cpython/pull/4015 >>>>>>>> >>>>>>>> I think either adding a %:z directive or expanding the definition >>>>>> of %z >>>>>>> would be pretty important, and I think there's a good case to be >>>>>> made for >>>>>>> either one. To summarize the arguments for people on the mailing >>>>>> list: >>>>>>>> >>>>>>>> The argument for expanding the definition of %z that I find >>>>>> strongest is >>>>>>> that according to the linux man pages ( >>>>>>> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >>>>>> generates >>>>>>> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO >>>>>> 8601 >>>>>>> standard timezone specification",and ISO 8601 uses +-HH:MM, so if >>>>>> we're >>>>>>> following those linux pages, we should be accepting the version with >>>>>> the >>>>>>> colon. >>>>>>>> >>>>>>>> The argument that I find most compelling for adding a %:z directive >>>>>> are: >>>>>>>> >>>>>>>> 1. maintains the symmetry between strftime and strptime >>>>>>>> 2. allows users to be stricter about their datetime format >>>>>>>> 3. has precedent in that GNU's `date` command accepts %z, %:z >>>>>> and >>>>>>> %::z formats >>>>>>>> >>>>>>>> Can we establish some consensus on which should be done so that it >>>>>> can be >>>>>>> implemented? >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Paul >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Datetime-SIG mailing list >>>>>>>> Datetime-SIG at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>>> https://www.python.org/psf/codeofconduct/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Datetime-SIG mailing list >>>>>>> Datetime-SIG at python.org >>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>> https://www.python.org/psf/codeofconduct/ >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Datetime-SIG mailing list >>>>>> Datetime-SIG at python.org >>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>> https://www.python.org/psf/codeofconduct/ >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Datetime-SIG mailing list >>>>> Datetime-SIG at python.org >>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>> The PSF Code of Conduct applies to this mailing list: >>>>> https://www.python.org/psf/codeofconduct/ >>>>> >>>>> >>>> >> > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mariocj89 at gmail.com Sat Oct 21 18:16:49 2017 From: mariocj89 at gmail.com (Mario Corchero) Date: Sat, 21 Oct 2017 23:16:49 +0100 Subject: [Datetime-SIG] Matching +-HH:MM in strptime In-Reply-To: References: <334f2c29-9782-2238-95d8-b61a84b1b2d4@ganssle.io> <55a69e8f-325a-71a2-778a-c4a0d050fb74@ganssle.io> Message-ID: > > I'm slightly leaning towards %:z because changing the semantics of %z > could be construed as a backwards-incompatible change (albeit a minor one). > I know some people have been asking for a "strict" version of the dateutil > parser, and people do tend to use parsers for string validation. Adding the > %:z option has the advantage that it's unambiguously backwards compatible, > and it can be added to strftime if that is deemed desirable. I think the issue in dateutil is a different one as the parser is fully flexible. Here, even if it can be claimed as a backwards-incompatible change (same could have been done in glibc) it seems quite fragile if you are using isoparse with %z to check that your offset does not have a ':'. Whilst in dateutil it is true that it can happen that sometimes it will parse happily things that "don't seem to be a date" (but they can actually be interpreted as so). Moreover, (ideally) this will get on a new Python version (3.7) not on a random patch. Last but not least, as a user, if you don't even read the docs. Would you not agree with %z being able to parse iso standard offsets? I actually found it surprising that it could not. I'd just keep it simple. I strongly prefer: "%z parses RFC-822/ISO 8601 standard utc offset" (what you usually work with). Over: if your offsets have a colon, use "%:z" if they dont, use "%z" if they can use Zulu remember to check for "Z" as well. BUT! As said, no authority here :) On 21 October 2017 at 17:12, Paul G wrote: > Back to the subject of how to handle +-HH:MM, I think the only really > viable candidates are %z and %:z, so I think the question boils down to > whether, with strptime, we care more about consistency with GNU / glibc's > strptime (which apparently do implement %z to cover both HHMM and HH:MM) or > whether we care more about users being able to specific *exactly* the > string they want to match (e.g. allowing users to specify that a colon > found in a time zone offset is an error condition). > > I'm slightly leaning towards %:z because changing the semantics of %z > could be construed as a backwards-incompatible change (albeit a minor one). > I know some people have been asking for a "strict" version of the dateutil > parser, and people do tend to use parsers for string validation. Adding the > %:z option has the advantage that it's unambiguously backwards compatible, > and it can be added to strftime if that is deemed desirable. > > Best, > > Paul > > On 10/21/2017 09:07 AM, Mario Corchero wrote: > > Sorry, hit send by mistake on the previous message. > > > > That is fine for parsing, but my issue with this is symmetry with > strftime. > > > > > > I can agree with having a %:z for support in strftime but I think that > is a > > separate change. The issue I opened with the attached PR focused only in > > strptime to facilitate the discussion. > > > > Again, what is the alternative? > > > > > > Making %z accept time-offset rfc3339 compatible. > > > > I have a working strptime: > > > > > > Ouch, except for the fractionals seconds (which was not part of the issue > > raised) I had also a patch for the colon and another for supporting 'Z' > as > > reported in the bug tracker. I was mentioning working with Paul in the > > implementation of isoparse, as even if it might look simple it has caused > > many long-standing discussions in the past. > > > > On 21 October 2017 at 13:55, Mario Corchero wrote: > > > >> > >> > >> On 21 October 2017 at 13:18, Oren Tirosh wrote: > >> > >>> > >>> On Sat, 21 Oct 2017 at 13:24, Mario Corchero > wrote: > >>> > >>>> My opinion (as a user, I have no authority here whatsoever) > >>>> > >>>> *1) About parsing colons in offsets with strptime* > >>>> > >>>> I think having %z support both +-HH:MM and +-HHMM would be the best > >>>> choice, as it seems the simplest for me as a user. > >>>> I'd go even further, making %z support ':' and 'Z', *a la glibc*. > >>>> This effectively means that %z can now parse: Z, ?hh:mm, ?hhmm, or ?hh > >>>> > >>> > >>> That is fine for parsing, but my issue with this is symmetry with > >>> strftime. If the same extensions are also implemented for formatting (I > >>> have a prototype) then you need some way to specify whether you want a > : > >>> separator or not. The %z will have to remain without colon on > formatting > >>> for backward compatibility. > >>> > >>> So l agree that the parser can be safely made more liberal in what it > >>> accepts, but the formatter must be strict and specific in what it > produces. > >>> > >>> I think this gives the best experience to the strptime user. It > >>>> basically makes the time-offset rfc3339 > >>>> compatible. > >>>> > >>> > >>> Yes, that's the goal. > >>> > >>> *2) Adding a handy function to build a datetime from a string > serialized > >>>> with isoformat* > >>>> Absolutely agree on having an isoparse. That would be amazing, we can > >>>> even build it on top of 1). > >>>> > >>> > >>> ...and building it on top of 1 requires several extensions and > variants. > >>> People here seem to be a bit taken aback by the scope of these > extensions. > >>> I understand this reaction, but I maintain that most or all this > complexity > >>> is necessary if you want to implement this on to of strptime rather > than a > >>> custom isoparse(). > >>> > >>> *Side note:* > >>>> I am not totally in favour with "%?:z" (probably because I am leaning > >>>> on %z doing the parsing for both and ?z will have no place on > strftime). > >>>> I think this starts to add way too much complexity to just say "parse > a > >>>> time-offset". > >>>> > >>> > >>> Again, what is the alternative? If you want a parser that accepts the > >>> output of isoformat() for all possible datetime values (except custom > >>> tzinfo) then it needs to support a missing tz offset as indicating a > naive > >>> timestamp. > >>> > >>> You can say that the real source of the asymmetry here is not with my > >>> proposal but rather in the underlying strftime/strptime: on > formatting, %z > >>> yields an empty string for a naive timestamp rather that producing an > >>> error. But on parsing, it refuses to parse a timestamp with no offset. > A > >>> truly symmetric implementation would have accepted it as an naive > >>> timestamp. > >>> > >>> Too late for %z because it must remain backward compatible, but perhaps > >>> %:z can be made to accept a missing offset as a naive timestamp. The > user > >>> can then check for naive timestamp and reject them if they are > unacceptable > >>> in that context, rather than specifying whether a missing timestamp is > >>> acceptable or not in the format string. I have no problem with either > >>> solution. > >>> > >>>> > >>>> *Implementation:* > >>>> I am happy to work with PaulG in the isoparse implementation if we > >>>> decide to go with it and if he wants to get involved :) > >>>> > >>> > >>> I have a working strptime: > >>> https://github.com/orent/cpython/tree/strptime_extensions > >>> > >>> isoparse() on top of this strptime is a trivial one-liner. > >>> > >>> Oren > >>> > >>>> > >>>> > >>>> *Thanks:* > >>>> Thanks for dedicating time to this, I think that even if minor this > >>>> would be a killer addition to 3.7 if we manage to get it through. > >>>> > >>>> On 21 October 2017 at 07:34, Oren Tirosh wrote: > >>>> > >>>>> ok, let's try to separate the issues and choices on each one: > >>>>> > >>>>> 1. Extending strptime to support time zone offset with : separator: > >>>>> Should a single directive accepts either hhmm or by:mm or use two > >>>>> separate directives? > >>>>> > >>>>> 2. Round tripping of isoformat() back to datetime value: > >>>>> Implement custom isoparse() function or extend strptime so isoparse > >>>>> simply calls strptime with a default format? > >>>>> Support all variations produced by isoformat or just a subset? > >>>>> (Variations include with/without fraction, with/without tz and > separator > >>>>> choice) > >>>>> > >>>>> I suggest 1 separate directives 2a extend strptime and 2b support all > >>>>> variations. Do you have different preferences on any of these > questions? > >>>>> > >>>>> I understand that the number of extensions to support this seems > >>>>> excessive to you. > >>>>> > >>>>> Technically, my proposed "%.f" is not really necessary. I added it > for > >>>>> completeness. We can keep using ".%f" for non-optional fraction and > define > >>>>> "%?f" to implicitly include the dot. > >>>>> > >>>>> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed > >>>>> down. This can be done, for example, by making "%z" and "%?s" always > accept > >>>>> hhmm with or without the : separator. > >>>>> > >>>>> On Fri, 20 Oct 2017 at 17:16, Paul G wrote: > >>>>> > >>>>>> I think this would be a much bigger change to the strptime interface > >>>>>> than is actually warranted, and probably would add in additional, > >>>>>> unnecessary complexity by introducing the concept of optional > matches. > >>>>>> Adding the capability to match HH:MM offsets is a reasonable > extension > >>>>>> partially because that is a standard representation that is > currently *not* > >>>>>> covered by strptime, and the fact that that's how isoformat() > represents > >>>>>> the offset just makes this lack all the more acute. > >>>>>> > >>>>>> I think it should be uncontroversial to add *one* of these two %z > >>>>>> extensions to Python 3 without getting bogged down in allowing a > single > >>>>>> strptime string to match any output from `.isoformat`. > >>>>>> > >>>>>> That said, I'm also very much in favor of a `.isoparse` or > >>>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, > which > >>>>>> should solve the issue without sweeping changes to how `strptime` > works. > >>>>>> > >>>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: > >>>>>>> https://github.com/orent/cpython/tree/strptime_extensions > >>>>>>> > >>>>>>> %:z - matches +HH:MM > >>>>>>> %?:z - optional %:z > >>>>>>> %.f - equivalent to .%f > >>>>>>> %?.f - optional %.f > >>>>>>> %?t - matches ' ' or 'T' > >>>>>>> > >>>>>>> What they all have in common is that together they make it possible > >>>>>> to > >>>>>>> write a strptime format that matches all possible output variations > >>>>>> of > >>>>>>> datetime.__str__/ datetime.isoformat. > >>>>>>> > >>>>>>> The time zone not only supports the : separator but also allows > >>>>>> making the > >>>>>>> entire component optional, as isoformat() will add it only for > aware > >>>>>>> datetime objects. The seconds fraction is dropped from the default > >>>>>> string > >>>>>>> representation if the datetime represents a whole second. Since it > is > >>>>>>> dropped along with the decimal dot, I first made "%.f" that > includes > >>>>>> the > >>>>>>> dot and then created the optional variant. Finally, "%?t" can be > >>>>>> used to > >>>>>>> accept a timestamp with either of the separators defined in > iso8601. > >>>>>>> > >>>>>>> It is quite absurd that datetime cannot parse its own string > >>>>>>> representation. Using these extensions an .isoparse() method may be > >>>>>> added > >>>>>>> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports > full > >>>>>>> round-tripping of all possible datetime values that do not not use > a > >>>>>> custom > >>>>>>> tzinfo. > >>>>>>> > >>>>>>> Oren > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Thu, 19 Oct 2017 at 17:06, Paul G wrote: > >>>>>>>> > >>>>>>>> There is a new issue about the %z directive in strptime on the > issue > >>>>>>> tracker: https://bugs.python.org/issue31800 (linked to a few > related > >>>>>>> issues), and a linked PR expanding the definition of %z to match > >>>>>> HH:MM: > >>>>>>> https://github.com/python/cpython/pull/4015 > >>>>>>>> > >>>>>>>> I think either adding a %:z directive or expanding the definition > >>>>>> of %z > >>>>>>> would be pretty important, and I think there's a good case to be > >>>>>> made for > >>>>>>> either one. To summarize the arguments for people on the mailing > >>>>>> list: > >>>>>>>> > >>>>>>>> The argument for expanding the definition of %z that I find > >>>>>> strongest is > >>>>>>> that according to the linux man pages ( > >>>>>>> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z > >>>>>> generates > >>>>>>> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO > >>>>>> 8601 > >>>>>>> standard timezone specification",and ISO 8601 uses +-HH:MM, so if > >>>>>> we're > >>>>>>> following those linux pages, we should be accepting the version > with > >>>>>> the > >>>>>>> colon. > >>>>>>>> > >>>>>>>> The argument that I find most compelling for adding a %:z > directive > >>>>>> are: > >>>>>>>> > >>>>>>>> 1. maintains the symmetry between strftime and strptime > >>>>>>>> 2. allows users to be stricter about their datetime format > >>>>>>>> 3. has precedent in that GNU's `date` command accepts %z, %:z > >>>>>> and > >>>>>>> %::z formats > >>>>>>>> > >>>>>>>> Can we establish some consensus on which should be done so that it > >>>>>> can be > >>>>>>> implemented? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> > >>>>>>>> Paul > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Datetime-SIG mailing list > >>>>>>>> Datetime-SIG at python.org > >>>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig > >>>>>>>> The PSF Code of Conduct applies to this mailing list: > >>>>>>> https://www.python.org/psf/codeofconduct/ > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Datetime-SIG mailing list > >>>>>>> Datetime-SIG at python.org > >>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig > >>>>>>> The PSF Code of Conduct applies to this mailing list: > >>>>>> https://www.python.org/psf/codeofconduct/ > >>>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Datetime-SIG mailing list > >>>>>> Datetime-SIG at python.org > >>>>>> https://mail.python.org/mailman/listinfo/datetime-sig > >>>>>> The PSF Code of Conduct applies to this mailing list: > >>>>>> https://www.python.org/psf/codeofconduct/ > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Datetime-SIG mailing list > >>>>> Datetime-SIG at python.org > >>>>> https://mail.python.org/mailman/listinfo/datetime-sig > >>>>> The PSF Code of Conduct applies to this mailing list: > >>>>> https://www.python.org/psf/codeofconduct/ > >>>>> > >>>>> > >>>> > >> > > > > > > > > _______________________________________________ > > Datetime-SIG mailing list > > Datetime-SIG at python.org > > https://mail.python.org/mailman/listinfo/datetime-sig > > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: