[Pandas-dev] API: Make silent casting behavior consistent by deprecating silent _object_-dtype casting

Joris Van den Bossche jorisvandenbossche at gmail.com
Wed Nov 10 12:53:10 EST 2021


Thanks for bringing this up.

Limiting the discussion to getitem for a moment (I think other methods like
fillna could deviate if we really want, or could have keywords for it), I
am personally in favor of option 2: making everything strict (since I
opened that referenced issue about it:
https://github.com/pandas-dev/pandas/issues/39584)

Now, on the short term, already starting to deprecate silent casting to
object (so the first aspect of option 3) doesn't prevent later becoming
even more strict (it only wouldn't fully solve the existing
inconsistencies), so for that point of view, I personally am fine with that.

Joris

On Wed, 27 Oct 2021 at 06:38, Brock Mendel <jbrockmendel at gmail.com> wrote:

> TLDR
> ----
> We have inconsistent silent-casting vs raising logic for numpy vs EA
> dtypes
> (and inconsistencies within EA dtypes).  By deprecating silently casting
> to *object* dtype, we can *mostly* make the behaviors match.
>
>
> Background
> ----------
> A number of Series/DataFrame methods will silently cast when dealing with
> mismatched values.  With a numpy dtype, each of the following silently
> cast to float64:
>
>     ser = pd.Series([1, 2, 3], dtype="i8")
>
>     ser.shift(1, fill_value=1.5)
>     ser.mask([True, False, False], 1.5)
>     ser.where([False, True, True], 1.5)
>     ser.replace(1, 1.5)
>     ser[0] = 1.5
>     ser.fillna(1.5)  # <- this one doesn't cast as it is a no-op
>
> If we were to pass "foo" or a pd.Period, these would coerce to object
> instead of float.
>
> By contrast, similar mixed-type operations with an ExtensionDtype Series
> _mostly_ raise:
>
>     ser2 = pd.Series(pd.period_range("2016-01-01", periods=3, freq="D"))
>
>     ser2.shift(1, fill_value=1.5)         # <- ValueError
>     ser2.mask([True, False, False], 1.5)  # <- ValueError
>     ser2.where([False, True, True], 1.5)  # <- ValueError
>     ser2.fillna(1.5)                      # <- TypeError
>     ser2.replace(ser2[0], 1.5)            # <- coerces to object
>     ser2[0] = 1.5                         # <- coerces to object
>
>     ser3 = pd.Series([pd.NA, 2, 3], dtype="Int64")
>
>     ser3.shift(1, fill_value=1.5)         # <- TypeError
>     ser3.mask([True, False, False], 1.5)  # <- TypeError
>     ser3.where([False, True, True], 1.5)  # <- TypeError
>     ser3.fillna(1.5)                      # <- TypeError
>     ser3.replace(ser3[0], 1.5)            # <- TypeError
>     ser3[0] = 1.5                         # <- TypeError
>
> timedelta64, datetime64, and datetime64tz mostly behave like the numpy
> dtypes,
> with a few exceptions:
>
>     - shift raises on mismatch
>     - fillna raises on mismatch for timedelta64, casts for the others
>
> Categorical mostly behaves like other ExtensionDtypes, except for replace
> which
> has special logic.
>
> Goals
> -----
> - Have matching behavior across dtypes.
> - Share code.
>
> Options
> -------
> 1) Change EA (and dt64/td64) behavior to match non-EA behavior
> 2) Change non-EA behavior to match EA behavior (or stricter xref
> https://github.com/pandas-dev/pandas/issues/39584)
> 3) Deprecate (and eventually raise on) silent casting to _object_ dtype,
> allowing silent casting otherwise.
>
>
> Here I am advocating for option 3).  The advantages as I see them:
>
> A) For numpy dtypes, we retain the most useful cases (int->float)
> B) Deprecates cases most likely to be unintentional (e.g. typo
> "2016-01-01" -> "2p16-01-01" causing a datetime64 Series to silently cast)
> C) For td64/dt64/dt64tz/period, the *only* silent casting is to object, so
> this completely gets rid of special-casing among that code
> D) For IntegerArray, FloatingArray, IntervalArray leaves open the option
> of allowing e.g. Integer->Floating casting (xref
> https://github.com/pandas-dev/pandas/issues/25288#issuecomment-941762174)
> E) Does not preclude later deciding on the stricter options in 2)
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211110/4cfc948f/attachment.html>


More information about the Pandas-dev mailing list