[Pandas-dev] Help replacing workflows that used DataFrame.select

Stephan Hoyer shoyer at gmail.com
Tue Nov 28 13:07:31 EST 2017


The biggest reason for deprecating DataFrame.select() was that it was
confusingly named. On GroupBy objects, it's equivalent to .filter(). Also,
SELECT in SQL does something very different, more like DataFrame.filter().
If only we could simply switch the names without causing more confusion!

So I think we would potentially be welcome to resurfacing the functionality
if necessary, though probably under a different name. For discussion see
https://github.com/pandas-dev/pandas/issues/12401

On Tue, Nov 28, 2017 at 4:52 PM Paul Hobson <pmhobson at gmail.com> wrote:

> Joris,
>
> Thanks for the nudge. I didn't understand that the callable could be
> passed the entire dataframe. That's what I needed.
>
> I'll miss the .select() method when it's gone, but it appears my use cases
> are covered.
>
> Cheers,
>
> -Paul
>
> On Tue, Nov 28, 2017 at 2:31 AM, Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> Hi Paul,
>>
>> That's a good question. I think you can do it with a lambda function,
>> like this:
>>
>> (data.
>>      ... (full pipeline)
>>      .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
>> )
>>
>> Does that work?
>>
>> But personally I am not sure if I find this really an usability
>> improvement compared to the select method.
>>
>> Best,
>> Joris
>>
>>
>>
>> 2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:
>>
>>> Hey folks,
>>>
>>> I noticed that DataFrame.select is now deprecated in favor of
>>> DataFrame.loc[index.map(selector_fxn)]
>>>
>>> PR: https://github.com/pandas-dev/pandas/pull/17633
>>> Issue: https://github.com/pandas-dev/pandas/issues/12401
>>>
>>> I have a lot of work flows that look something like this:
>>>
>>>     res = (
>>>         data.resample(freq)
>>>             .agg(agg_dict)
>>>             .pipe(fxn_that_adds_many_cols)
>>>             .select(complex_fxn_that_selects_a_few_cols, axis='columns')
>>>     )
>>>
>>> It's not immediately clear to me how to access all of the e.g., columns
>>> in the middle or at the end of a chain of dataframe operations.
>>>
>>> Any tips?
>>>
>>> -Paul
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>>
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/d3fc3a22/attachment.html>


More information about the Pandas-dev mailing list