From jorisvandenbossche at gmail.com Wed Jul 22 16:59:28 2015 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 22 Jul 2015 16:59:28 +0200 Subject: [Pandas-dev] Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise' Message-ID: Hi all, On github there is a proposal to change the default behaviour of to_datetime in case of a parsing error from 'ignore' (leaving the values untouched) to 'raise' (raise an error). As a small example, the current behaviour: In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now Out[5]: '2014-30-30' In [6]: pd.to_datetime('2014-30-30', errors='raise') ... ValueError: month must be in 1..12 So the proposal would be to change the default to the second case, raising an error. Note that this behaviour is already the default when providing your own format (and so in fact ignoring the value of the errors keyword): In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') ... ValueError: time data '2014-30-30' does not match format '%Y-%m-%d' *Are there any objections to this change? * *Are there people relying on the fact that, by default, to_datetime returns the exact original value if parsing does not succeed?* Best regards, Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Mon Jul 27 12:01:12 2015 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Mon, 27 Jul 2015 12:01:12 +0200 Subject: [Pandas-dev] Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise' In-Reply-To: <4703f7db-477f-4dbf-8f9c-d462c65e9765@googlegroups.com> References: <67eb09c4-4eb4-4391-a2f7-3f032db6ee27@googlegroups.com> <4703f7db-477f-4dbf-8f9c-d462c65e9765@googlegroups.com> Message-ID: I forgot to mention the github issue: https://github.com/pydata/pandas/issues/10636, and there is now also a PR to do the change: https://github.com/pydata/pandas/pull/10674 John, if we want some more verbose output when coercing the errors, is a separate issue I think (as this is not the default). You can always open an issue on github for this. Joris 2015-07-23 17:10 GMT+02:00 John E : > Well, I was only yesterday complaining at github about the silent default > of read_csv converting 'NA' to NaN. ;-) So I have to agree with Lorenzo > that this is a good change. It also seems more consistent with pandas > overall behavior. > > FWIW, Stata's default with these sorts of operations is always to tell you > how many values were changed, which is often very helpful. E.g. if Stata > tells you zero values were changed, this is a big clue you screwed up. > Often this is more verbose than desired, but it's also easy to change that. > > So, I'm definitely fine with just making it an error, but a possible > middle ground would be a short report like: "20 values changed, 5 values > not changed". > > > > On Thursday, July 23, 2015 at 9:51:11 AM UTC-4, Lorenzo De Leo wrote: >> >> Personally I'm very much in favor of this change. I don't like silent >> defaults ;) >> >> L >> >> >> On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche >> wrote: >>> >>> Hi all, >>> >>> On github there is a proposal to change the default behaviour of >>> to_datetime in case of a parsing error from 'ignore' (leaving the >>> values untouched) to 'raise' (raise an error). >>> >>> As a small example, the current behaviour: >>> >>> In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now >>> Out[5]: '2014-30-30' >>> >>> In [6]: pd.to_datetime('2014-30-30', errors='raise') >>> ... >>> ValueError: month must be in 1..12 >>> >>> >>> So the proposal would be to change the default to the second case, >>> raising an error. >>> >>> Note that this behaviour is already the default when providing your own format >>> (and so in fact ignoring the value of the errors keyword): >>> >>> In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') >>> ... >>> ValueError: time data '2014-30-30' does not match format '%Y-%m-%d' >>> >>> >>> *Are there any objections to this change? * >>> *Are there people relying on the fact that, by default, to_datetime >>> returns the exact original value if parsing does not succeed?* >>> >>> Best regards, >>> Joris >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eiler13 at gmail.com Thu Jul 23 11:22:28 2015 From: eiler13 at gmail.com (John E) Date: Thu, 23 Jul 2015 15:22:28 -0000 Subject: [Pandas-dev] Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise' In-Reply-To: <67eb09c4-4eb4-4391-a2f7-3f032db6ee27@googlegroups.com> References: <67eb09c4-4eb4-4391-a2f7-3f032db6ee27@googlegroups.com> Message-ID: <4703f7db-477f-4dbf-8f9c-d462c65e9765@googlegroups.com> Well, I was only yesterday complaining at github about the silent default of read_csv converting 'NA' to NaN. ;-) So I have to agree with Lorenzo that this is a good change. It also seems more consistent with pandas overall behavior. FWIW, Stata's default with these sorts of operations is always to tell you how many values were changed, which is often very helpful. E.g. if Stata tells you zero values were changed, this is a big clue you screwed up. Often this is more verbose than desired, but it's also easy to change that. So, I'm definitely fine with just making it an error, but a possible middle ground would be a short report like: "20 values changed, 5 values not changed". On Thursday, July 23, 2015 at 9:51:11 AM UTC-4, Lorenzo De Leo wrote: > > Personally I'm very much in favor of this change. I don't like silent > defaults ;) > > L > > > On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche > wrote: >> >> Hi all, >> >> On github there is a proposal to change the default behaviour of >> to_datetime in case of a parsing error from 'ignore' (leaving the values >> untouched) to 'raise' (raise an error). >> >> As a small example, the current behaviour: >> >> In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now >> Out[5]: '2014-30-30' >> >> In [6]: pd.to_datetime('2014-30-30', errors='raise') >> ... >> ValueError: month must be in 1..12 >> >> >> So the proposal would be to change the default to the second case, >> raising an error. >> >> Note that this behaviour is already the default when providing your own format >> (and so in fact ignoring the value of the errors keyword): >> >> In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') >> ... >> ValueError: time data '2014-30-30' does not match format '%Y-%m-%d' >> >> >> *Are there any objections to this change? * >> *Are there people relying on the fact that, by default, to_datetime >> returns the exact original value if parsing does not succeed?* >> >> Best regards, >> Joris >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lorenzo.deleo at gmail.com Thu Jul 23 10:01:24 2015 From: lorenzo.deleo at gmail.com (Lorenzo De Leo) Date: Thu, 23 Jul 2015 14:01:24 -0000 Subject: [Pandas-dev] Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise' In-Reply-To: References: Message-ID: <67eb09c4-4eb4-4391-a2f7-3f032db6ee27@googlegroups.com> Personally I'm very much in favor of this change. I don't like silent defaults ;) L On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche wrote: > > Hi all, > > On github there is a proposal to change the default behaviour of > to_datetime in case of a parsing error from 'ignore' (leaving the values > untouched) to 'raise' (raise an error). > > As a small example, the current behaviour: > > In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now > Out[5]: '2014-30-30' > > In [6]: pd.to_datetime('2014-30-30', errors='raise') > ... > ValueError: month must be in 1..12 > > > So the proposal would be to change the default to the second case, raising > an error. > > Note that this behaviour is already the default when providing your own format > (and so in fact ignoring the value of the errors keyword): > > In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') > ... > ValueError: time data '2014-30-30' does not match format '%Y-%m-%d' > > > *Are there any objections to this change? * > *Are there people relying on the fact that, by default, to_datetime > returns the exact original value if parsing does not succeed?* > > Best regards, > Joris > -------------- next part -------------- An HTML attachment was scrubbed... URL: