[Pandas-dev] Pandas astype() changes the class type
Joris Van den Bossche
jorisvandenbossche at gmail.com
Wed Dec 15 04:30:27 EST 2021
Hi Simeon,
This is a somewhat known issue with astype(), and more in general related
to the behaviour of concat dealing with subclasses.
For example, in GeoPandas, we override astype() for this reason to ensure a
proper return type:
https://github.com/geopandas/geopandas/blob/ee8adfb27659e9f982ba8cdadbf62c6b36dcc053/geopandas/geodataframe.py#L1694-L1718
When using astype with a dictionary of column name -> dtype, the underlying
implementation casts every column separately and then uses concat to
combine the columns (Series objects) back into a dataframe.
However, without doing anything special in astype(), that means it relies
on the logic of concat to determine the output class (which is to use the
_constructor_expanddim of the first object, i.e. of the first column /
Series). See https://github.com/pandas-dev/pandas/issues/35415 for some
discussion about this.
I think that we could add some extra logic to the astype method
implementation to try to preserve the original class (by using its
_constructor) after doing the concat, similarly as was done recently for
the convert_dtypes() method (https://github.com/pandas-dev/pandas/pull/44249).
I think a contribution (pull request) for that would certainly be welcome!
Best,
Joris
On Mon, 13 Dec 2021 at 13:17, Simeon Simeonov <simeon.simeonov.s at gmail.com>
wrote:
> Hi all,
>
> I saw this behaviour and I don't know if this is a bug or feature. I don't
> have much experience with directly inheriting from pandas.DataFrame as I've
> always preferred aggregation rather than inheritance there. A working
> sample is pasted below. Notice how *df.astype(dtypes)* changes the type
> to pandas.DataFrame. Any suggestions if this is intended behaviour?
>
>
> import pandas as pd
> class DF(pd.DataFrame): @property
> def _constructor(self):
> return self.__class__
>
>
> df = DF({
> 'A': [1,2,3],
> 'B': [10,20,30],
> 'C': [100,200,300],
> }) # Type is DF
>
>
> a = df['A'] # type is Series
> ab = df[['A', 'B']] # type is DF
>
> dtypes = {'A': 'float64', 'B': 'float64', 'C': 'float64'}
> x = df.astype(dtypes)
> type(x) # type is pd.DataFrame
>
>
> Regards,
>
> Simeon
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211215/af79fc90/attachment.html>
More information about the Pandas-dev
mailing list