[scikit-learn] Request / Proposal: integrating IEEE paper in scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark datasets
Gael Varoquaux
gael.varoquaux at normalesup.org
Sun Sep 24 14:39:53 EDT 2023
Dear Dalibor,
As detailed in the FAQ,
https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
"""
We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness.
"""
These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms.
Best,
Gaël
On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
> Dear scikit-learn mailing list
> similarly to standing feature_selection.RFE and RFECV, this is a request to
> openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or
> EFSCV which would stand for "Evolutionary Feature Selection" with starting 8
> algorithms or methods to be used with scikit-learn estimators, just as
> published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper.
> They agreed to help integrate it (in cc).
> PROPOSAL
> Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn:
> 1) CODE
> • implementing feature_selection.EFS and/or EFSCV (a space for evolutionary
> computing community interested in feature selection)
> RFE is:
> feature_selection.RFE Feature ranking with recursive feature
> (estimator, *[, ...]) elimination.
> feature_selection.RFECV Recursive feature elimination with
> (estimator, *[, ...]) cross-validation to select features.
> The "EFS" could be:
> Feature ranking and feature elimination with 8
> feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new
> (estimator, *[, ...]) algorithms could be added and benchmarked with
> evolutionary computing, swarm, genetic etc.
> feature_selection.EFSCV Feature elimination with cross-validation to select
> (estimator, *[, ...]) features
> 2) DATASETS & CANCER BENCHMARK
> • curating and integrating fetch of cancer_benchmark 40 datasets, directly in
> scikit-learn or externally pullable somehow and maintained (space for
> contributing expanding high-dimensional datasets on cancer topics).
> fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional
> (*[,, ...]) datasets for benchmarking feature selection methods
> (classification).
> 3) TUTORIAL / WEBSITE
> • writing tutorial to replicate IEEE paper results with feature_selection.EFS
> and/or EFSCV on cancer_benchmark (40 datasets)
> I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very
> interesting novelty in working with high-dimensional datasets as it reports
> small subsets of predictive features selected with SVM, KNN across 40 datasets.
> Replicability under BSD-3 and high quality under scikit-learn could assure
> benchmarking novel feature selection algorithms easier - in my very first
> opinion. Since this is the very first touch of myself with IEEE paper authors
> and the scikit-learn list altogether, we would welcome some help/guide
> how integration could work out, and if there is any interest on that line at
> all.
> Kind regards
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
>
> On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort <alexandre.gramfort at inria.fr
> > wrote:
> Dear Dalibor
> you should discuss this on the main scikit-learn mailing list.
> https://mail.python.org/mailman/listinfo/scikit-learn
> Alex
> On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor.hrg at gmail.com> wrote:
> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
> This is a request to openly discuss the idea of potential for
> feature_selection.EFS which would stand for "Evolutionary Feature
> Selection" or shortly EFS with starting 8 algorithms as published in
> IEEE https://arxiv.org/abs/2303.10182 by the authors on
> high-dimensional datasets. I have identified this work to be of very
> interesting novelty in working with high-dimensional datasets,
> especially for health fields, and it could mean a lot to the ML
> community and scikit-learn project - in my very first opinion.
> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
> paper/work as feature_selection.EFS and 8 algorithms in it could be a
> near term goal. And eventually, scikit-learn EFSCV and diverse
> classification algorithms could be benchmarked for "joint paper" in
> JOSS, or a health journal.
> My initial idea (doesn't need to be that way or is open to discussion)
> has some first thought like this:
>
> RFE has:
> feature_selection.RFE Feature ranking with recursive feature
> (estimator, *[, ...]) elimination.
> feature_selection.RFECV Recursive feature elimination with
> (estimator, *[, ...]) cross-validation to select features.
> The "EFS" could have:
> Feature ranking and feature elimination with 8
> feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new
> (estimator, *[, ...]) algorithms could be added and benchmarked with
> evolutionary computing, swarm, genetic etc.
> feature_selection.EFSCV Feature elimination with cross-validation to
> (estimator, *[, ...]) select features
> Looking forward to an open discussion and if Evolutionary Feature
> Selection EFS is something for sklearn project, or maybe a separate pip
> install package.
> Kind regards
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
> On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadzade at yahoo.com
> > wrote:
> Dear Dalibor Hrg,
> Thank you very much for your attention to the SFE algorithm. Thank
> you very much for the time you took to guide me and my colleagues.
> According to your guidance, we will add this algorithm to the
> scikit-learn library as soon as possible.
> Kind regards,
> Ahadzadeh.
> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor
> Hrg <dalibor.hrg at gmail.com> wrote:
> Dear Authors,
> you have done some amazing work on feature selection here published
> in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python
> code here without a LICENSE file or any info on this: https://
> github.com/Ahadzadeh2022/SFE and in the paper some links are
> mentioned to download data.
> I would be interested with you that we:
> Step 1) make and release a pip package, publish this code in JOSS
> https://joss.readthedocs.io i.e. https://joss.theoj.org/papers/
> 10.21105/joss.04611 under BSD-3 license and replicate IEEE paper
> table results. All 8 algorithms could be in potentially one class
> "EFS" meaning "Evolutionary Feature Selection", selectable as 8
> options among them SFE. Or something like that.
>
> Step 2) try integrate and work with scikit-learn people, I would
> recommend it to integrate this under https://scikit-learn.org/
> stable/modules/classes.html#module-sklearn.feature_selection
> similarly to sklearn.feature_selection.RFE. I believe this would
> be a great contribution to the best open library for ML,
> scikit-learn.
> I am unsure what is the status of datasets and licenses therein?.
> But, the datasets could be fetched externally from OpenML.org
> repository, for example https://scikit-learn.org/stable/datasets/
> loading_other_datasets.html or CERN Zenodo where "benchmark
> datasets" could be expanded. It depends a bit on the dataset
> licenses?
> Overall, I hope this can hugely maximize your published work
> visibility but also for others to credit you in papers in a more
> citable and replicable way. I believe your IEEE paper and work
> definitely deserve a spot in scikit-learn. There is need for some
> replicable code on "Evolutionary Methods for Feature Selection" and
> such Benchmark in life-science datasets, and you have done some
> great work so far.
> Let me know what you think.
> Best regards,
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
--
Gael Varoquaux
Research Director, INRIA
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
More information about the scikit-learn
mailing list