[scikit-learn] Request / Proposal: integrating IEEE paper in scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark datasets

Dalibor Hrg dalibor.hrg at gmail.com
Sat Sep 23 21:29:37 EDT 2023


Dear Gael,

Thanks for clarification. Yes, I see, there is a need for more broad use of
evidence and citations of such methods or approaches. This is somehow what
I was thinking.

By looking here at sister projects
https://scikit-learn.org/stable/related_projects.html#related-projects or
especially package "Boruta"
https://github.com/scikit-learn-contrib/boruta_py, small question for a
hint: do you think such a pip package as Boruta could be closest fit by
implementing it with the cancer benchmark dataset, and replicating the
paper results?

Certainly, potential is to benchmark and publish on RFE and EFS how they go
along the benchmark, and demonstrate on diverse high-dimensional datasets
coming from other domains by other publications. Doing that is a long term
journey to show the usefulness of the method/algorithm.

Best,
Dalibor


On Sun, Sep 24, 2023, 21:37 Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> Dear Dalibor,
>
> As detailed in the FAQ,
>
> https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
> """
> We only consider well-established algorithms for inclusion. A rule of
> thumb is at least 3 years since publication, 200+ citations, and wide use
> and usefulness.
> """
>
> These days, I would say that the bar is even harder, as we are finding
> that we prioritize things such as high-quality documentation or better
> dataframe support to new algorithms.
>
> Best,
>
> Gaël
>
> On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
> > Dear scikit-learn mailing list
>
> > similarly to standing feature_selection.RFE and RFECV, this is a request
> to
> > openly discuss the PROPOSAL and requirements of feature_selection.EFS
> and/or
> > EFSCV which would stand for "Evolutionary Feature Selection" with
> starting 8
> > algorithms or methods to be used with scikit-learn estimators, just as
> > published in IEEE https://arxiv.org/abs/2303.10182 by the authors of
> paper.
> > They agreed to help integrate it (in cc).
>
> > PROPOSAL
> > Implement/integrate https://arxiv.org/abs/2303.10182 paper into
> scikit-learn:
>
> > 1) CODE
>
> >   • implementing feature_selection.EFS and/or EFSCV (a space for
> evolutionary
> >     computing community interested in feature selection)
>
> > RFE is:
>
> > feature_selection.RFE          Feature ranking with recursive feature
> > (estimator, *[, ...])          elimination.
>
> > feature_selection.RFECV        Recursive feature elimination with
> > (estimator, *[, ...])          cross-validation to select features.
>
> >  The "EFS" could be:
>
> >                         Feature ranking and feature elimination with 8
> > feature_selection.EFS   different algorithms, SFE, SFE-PSO etc. <- new
> > (estimator, *[, ...])   algorithms could be added and benchmarked with
> >                         evolutionary computing, swarm, genetic etc.
>
> > feature_selection.EFSCV Feature elimination with cross-validation to
> select
> > (estimator, *[, ...])   features
>
>
> > 2) DATASETS & CANCER BENCHMARK
>
> >   • curating and integrating fetch of cancer_benchmark 40 datasets,
> directly in
> >     scikit-learn or externally pullable somehow and maintained (space for
> >     contributing expanding high-dimensional datasets on cancer topics).
>
> > fetch_cancer-benchmark Loads 40 individual cancer related
> high-dimensional
> > (*[,, ...])            datasets for benchmarking feature selection
> methods
> >                        (classification).
>
>
> > 3) TUTORIAL / WEBSITE
>
> >   • writing tutorial to replicate IEEE paper results
> with feature_selection.EFS
> >     and/or EFSCV on cancer_benchmark (40 datasets)
>
>
> > I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of
> very
> > interesting novelty in working with high-dimensional datasets as it
> reports
> > small subsets of predictive features selected with SVM, KNN across 40
> datasets.
> > Replicability under BSD-3 and high quality under scikit-learn could
> assure
> > benchmarking novel feature selection algorithms easier - in my very first
> > opinion. Since this is the very first touch of myself with IEEE paper
> authors
> > and the scikit-learn list altogether, we would welcome some help/guide
> > how integration could work out, and if there is any interest on that
> line at
> > all.
>
> > Kind regards
> > Dalibor Hrg
> > https://www.linkedin.com/in/daliborhrg/
> >
>
> > On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort <
> alexandre.gramfort at inria.fr
> > > wrote:
>
> >     Dear Dalibor
>
> >     you should discuss this on the main scikit-learn mailing list.
>
> >     https://mail.python.org/mailman/listinfo/scikit-learn
>
> >     Alex
>
> >     On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg at gmail.com>
> wrote:
>
> >         Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>
> >         This is a request to openly discuss the idea of potential for
> >         feature_selection.EFS which would stand for "Evolutionary Feature
> >         Selection" or shortly EFS with starting 8 algorithms as
> published in
> >         IEEE https://arxiv.org/abs/2303.10182 by the authors on
> >         high-dimensional datasets. I have identified this work to be of
> very
> >         interesting novelty in working with high-dimensional datasets,
> >         especially for health fields, and it could mean a lot to the ML
> >         community and scikit-learn project - in my very first opinion.
>
> >         A Jupyter Notebook and scikit-learn tutorial replicating this
> IEEE
> >         paper/work as feature_selection.EFS and 8 algorithms in it could
> be a
> >         near term goal. And eventually, scikit-learn EFSCV and diverse
> >         classification algorithms could be benchmarked for "joint paper"
> in
> >         JOSS, or a health journal.
>
> >         My initial idea (doesn't need to be that way or is open to
> discussion)
> >         has some first thought like this:
> >
> >         RFE has:
>
> >         feature_selection.RFE       Feature ranking with recursive
> feature
> >         (estimator, *[, ...])       elimination.
>
> >         feature_selection.RFECV     Recursive feature elimination with
> >         (estimator, *[, ...])       cross-validation to select features.
>
> >          The "EFS" could have:
>
> >                                 Feature ranking and feature elimination
> with 8
> >         feature_selection.EFS   different algorithms, SFE, SFE-PSO etc.
> <- new
> >         (estimator, *[, ...])   algorithms could be added and
> benchmarked with
> >                                 evolutionary computing, swarm, genetic
> etc.
>
> >         feature_selection.EFSCV Feature elimination with
> cross-validation to
> >         (estimator, *[, ...])   select features
>
> >         Looking forward to an open discussion and if Evolutionary Feature
> >         Selection EFS is something for sklearn project, or maybe a
> separate pip
> >         install package.
>
> >         Kind regards
> >         Dalibor Hrg
> >         https://www.linkedin.com/in/daliborhrg/
>
> >         On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <
> b.ahadzade at yahoo.com
> >         > wrote:
>
>
>
> >             Dear Dalibor Hrg,
>
> >             Thank you very much for your attention to the SFE algorithm.
> Thank
> >             you very much for the time you took to guide me and my
> colleagues.
> >             According to your guidance, we will add this algorithm to the
> >             scikit-learn library as soon as possible.
>
> >             Kind regards,
> >             Ahadzadeh.
> >             On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30,
> Dalibor
> >             Hrg <dalibor.hrg at gmail.com> wrote:
>
>
> >             Dear Authors,
>
> >             you have done some amazing work on feature selection here
> published
> >             in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed
> Python
> >             code here without a LICENSE file or any info on
> this: https://
> >             github.com/Ahadzadeh2022/SFE and in the paper some links are
> >             mentioned to download data.
>
> >             I would be interested with you that we:
>
> >             Step 1) make and release a pip package, publish this code in
> JOSS
> >             https://joss.readthedocs.io i.e.
> https://joss.theoj.org/papers/
> >             10.21105/joss.04611 under BSD-3 license and replicate IEEE
> paper
> >             table results. All 8 algorithms could be in potentially one
> class
> >             "EFS" meaning "Evolutionary Feature Selection", selectable
> as 8
> >             options among them SFE. Or something like that.
> >
> >             Step 2) try integrate and work with scikit-learn people, I
> would
> >             recommend it to integrate this under
> https://scikit-learn.org/
> >             stable/modules/classes.html#module-sklearn.feature_selection
> >              similarly to sklearn.feature_selection.RFE. I believe this
> would
> >             be a great contribution to the best open library for ML,
> >             scikit-learn.
>
> >             I am unsure what is the status of datasets and licenses
> therein?.
> >             But, the datasets could be fetched externally from OpenML.org
> >             repository, for example
> https://scikit-learn.org/stable/datasets/
> >             loading_other_datasets.html or CERN Zenodo where "benchmark
> >             datasets" could be expanded. It depends a bit on the dataset
> >             licenses?
>
> >             Overall, I hope this can hugely maximize your published work
> >             visibility but also for others to credit you in papers in a
> more
> >             citable and replicable way. I believe your IEEE paper and
> work
> >             definitely deserve a spot in scikit-learn. There is need for
> some
> >             replicable code on "Evolutionary Methods for Feature
> Selection" and
> >             such Benchmark in life-science datasets, and you have done
> some
> >             great work so far.
>
> >             Let me know what you think.
>
> >             Best regards,
> >             Dalibor Hrg
>
> >             https://www.linkedin.com/in/daliborhrg/
>
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> --
>     Gael Varoquaux
>     Research Director, INRIA
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230924/856f5555/attachment-0001.html>


More information about the scikit-learn mailing list