[Pandas-dev] Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise'

Joris Van den Bossche jorisvandenbossche at gmail.com
Mon Jul 27 06:01:12 EDT 2015


I forgot to mention the github issue:
https://github.com/pydata/pandas/issues/10636, and there is now also a PR
to do the change: https://github.com/pydata/pandas/pull/10674

John, if we want some more verbose output when coercing the errors, is a
separate issue I think (as this is not the default). You can always open an
issue on github for this.

Joris

2015-07-23 17:10 GMT+02:00 John E <eiler13 at gmail.com>:

> Well, I was only yesterday complaining at github about the silent default
> of read_csv converting 'NA' to NaN.   ;-)  So I have to agree with Lorenzo
> that this is a good change.  It also seems more consistent with pandas
> overall behavior.
>
> FWIW, Stata's default with these sorts of operations is always to tell you
> how many values were changed, which is often very helpful.  E.g. if Stata
> tells you zero values were changed, this is a big clue you screwed up.
> Often this is more verbose than desired, but it's also easy to change that.
>
> So, I'm definitely fine with just making it an error, but a possible
> middle ground would be a short report like:  "20 values changed, 5 values
> not changed".
>
>
>
> On Thursday, July 23, 2015 at 9:51:11 AM UTC-4, Lorenzo De Leo wrote:
>>
>> Personally I'm very much in favor of this change. I don't like silent
>> defaults ;)
>>
>> L
>>
>>
>> On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche
>> wrote:
>>>
>>> Hi all,
>>>
>>> On github there is a proposal to change the default behaviour of
>>> to_datetime in case of a parsing error from 'ignore' (leaving the
>>> values untouched) to 'raise' (raise an error).
>>>
>>> As a small example, the current behaviour:
>>>
>>> In [5]: pd.to_datetime('2014-30-30', errors='ignore')   # the default now
>>> Out[5]: '2014-30-30'
>>>
>>> In [6]: pd.to_datetime('2014-30-30', errors='raise')
>>> ...
>>> ValueError: month must be in 1..12
>>>
>>>
>>> So the proposal would be to change the default to the second case,
>>> raising an error.
>>>
>>> Note that this behaviour is already the default when providing your own format
>>> (and so in fact ignoring the value of the errors keyword):
>>>
>>> In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d')
>>> ...
>>> ValueError: time data '2014-30-30' does not match format '%Y-%m-%d'
>>>
>>>
>>> *Are there any objections to this change? *
>>> *Are there people relying on the fact that, by default, to_datetime
>>> returns the exact original value if parsing does not succeed?*
>>>
>>> Best regards,
>>> Joris
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20150727/0e4988ff/attachment.html>


More information about the Pandas-dev mailing list