[Pandas-dev] Pandas astype() changes the class type

Joris Van den Bossche jorisvandenbossche at gmail.com
Wed Dec 15 04:30:27 EST 2021


Hi Simeon,

This is a somewhat known issue with astype(), and more in general related
to the behaviour of concat dealing with subclasses.

For example, in GeoPandas, we override astype() for this reason to ensure a
proper return type:
https://github.com/geopandas/geopandas/blob/ee8adfb27659e9f982ba8cdadbf62c6b36dcc053/geopandas/geodataframe.py#L1694-L1718

When using astype with a dictionary of column name -> dtype, the underlying
implementation casts every column separately and then uses concat to
combine the columns (Series objects) back into a dataframe.
However, without doing anything special in astype(), that means it relies
on the logic of concat to determine the output class (which is to use the
_constructor_expanddim of the first object, i.e. of the first column /
Series). See https://github.com/pandas-dev/pandas/issues/35415 for some
discussion about this.

I think that we could add some extra logic to the astype method
implementation to try to preserve the original class (by using its
_constructor) after doing the concat, similarly as was done recently for
the convert_dtypes() method (https://github.com/pandas-dev/pandas/pull/44249).
I think a contribution (pull request) for that would certainly be welcome!

Best,
Joris

On Mon, 13 Dec 2021 at 13:17, Simeon Simeonov <simeon.simeonov.s at gmail.com>
wrote:

> Hi all,
>
> I saw this behaviour and I don't know if this is a bug or feature. I don't
> have much experience with directly inheriting from pandas.DataFrame as I've
> always preferred aggregation rather than inheritance there. A working
> sample is pasted below. Notice how *df.astype(dtypes)* changes the type
> to pandas.DataFrame. Any suggestions if this is intended behaviour?
>
>
> import pandas as pd
> class DF(pd.DataFrame):    @property
>     def _constructor(self):
>         return self.__class__
>
>
> df = DF({
>     'A': [1,2,3],
>     'B': [10,20,30],
>     'C': [100,200,300],
> })  # Type is DF
>
>
> a = df['A'] # type is Series
> ab = df[['A', 'B']] # type is DF
>
> dtypes = {'A': 'float64', 'B': 'float64', 'C': 'float64'}
> x = df.astype(dtypes)
> type(x)  # type is pd.DataFrame
>
>
> Regards,
>
> Simeon
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211215/af79fc90/attachment.html>


More information about the Pandas-dev mailing list