From ulderico.santarelli at gmail.com  Thu Sep 14 11:47:09 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Thu, 14 Sep 2023 17:47:09 +0200
Subject: [scikit-learn] (no subject)
Message-ID: <CAAbcUg5b6wfA_Zez=qipGiNSTV+dstahxQEEiVLo0fHY55kbtg@mail.gmail.com>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230914/25eac5ce/attachment.html>

From ulderico.santarelli at gmail.com  Thu Sep 14 12:26:34 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Thu, 14 Sep 2023 18:26:34 +0200
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
Message-ID: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>

      *I am an old guy who started programming around the seventies of the
last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM APPLICATION
SYSTEM and, last, the marvelous SAS. Having heard around about the
powerful, flexible, functionally complete PYTHON UNIVERSE?, encompassing an
advanced Object-Oriented Language and a very wide family of packages, I
decided to run an exercise about a problem I've been tackling since my
youth (have a look at the Bibliography). I succeeded in completing it in a
few days and I'm attaching my solution to the problem of finding the points
in a sample that are "central" in a surrounding topological neighborhood.
They are eligible as centroids for a Cluster Analysis after the aggregation
of "too near points'. The solution is based on the search of
potential wells in a suitable potential field, similar to the one all of us
studied in high school. Therefore, too near points may be in the same
potential well.
No more words, have a look at the attachment.
My coding is that of a beginner. I'm sure everybody would find more
efficient coding.  As a comment: I started studying Python around May 15th
2023.
My best regards.
Ulderico Santarelli.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230914/a026149f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SAMPLE POINTS CENTRALITY INDEX.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 49923 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230914/a026149f/attachment-0001.docx>

From jalopcar at gmail.com  Sun Sep 17 12:12:03 2023
From: jalopcar at gmail.com (Jaime Lopez)
Date: Sun, 17 Sep 2023 11:12:03 -0500
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
Message-ID: <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>

Hi there,

I got interested in your project, but I found this error from the beginning
(see attached image).
The work array cannot be reshaped to (1,4), cause it has shape (2,1), any
suggestions?

JL

[image: image.png]

On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
ulderico.santarelli at gmail.com> wrote:

>       *I am an old guy who started programming around the seventies of
> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
> encompassing an advanced Object-Oriented Language and a very wide family of
> packages, I decided to run an exercise about a problem I've been tackling
> since my youth (have a look at the Bibliography). I succeeded in completing
> it in a few days and I'm attaching my solution to the problem of finding
> the points in a sample that are "central" in a surrounding topological
> neighborhood. They are eligible as centroids for a Cluster Analysis after
> the aggregation of "too near points'. The solution is based on the search
> of potential wells in a suitable potential field, similar to the one all of
> us studied in high school. Therefore, too near points may be in the same
> potential well.
> No more words, have a look at the attachment.
> My coding is that of a beginner. I'm sure everybody would find more
> efficient coding.  As a comment: I started studying Python around May 15th
> 2023.
> My best regards.
> Ulderico Santarelli.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 

*Jaime Lopez Carvajal*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230917/310376bd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230917/310376bd/attachment-0001.png>

From ulderico.santarelli at gmail.com  Sun Sep 17 12:44:58 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Sun, 17 Sep 2023 18:44:58 +0200
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
 <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
Message-ID: <CAAbcUg4J2OOmqXeu+awugVbPeduKZWG3+zZ82s9t00DEg2_cDw@mail.gmail.com>

I'm going to have a look at this. Thank you for your comment.


Il giorno dom 17 set 2023 alle ore 18:14 Jaime Lopez <jalopcar at gmail.com>
ha scritto:

> Hi there,
>
> I got interested in your project, but I found this error from the
> beginning (see attached image).
> The work array cannot be reshaped to (1,4), cause it has shape (2,1), any
> suggestions?
>
> JL
>
> [image: image.png]
>
> On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
> ulderico.santarelli at gmail.com> wrote:
>
>>       *I am an old guy who started programming around the seventies of
>> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
>> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
>> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
>> encompassing an advanced Object-Oriented Language and a very wide family of
>> packages, I decided to run an exercise about a problem I've been
>> tackling since my youth (have a look at the Bibliography). I succeeded in
>> completing it in a few days and I'm attaching my solution to the problem of
>> finding the points in a sample that are "central" in a surrounding
>> topological neighborhood. They are eligible as centroids for a Cluster
>> Analysis after the aggregation of "too near points'. The solution is based
>> on the search of potential wells in a suitable potential field, similar to
>> the one all of us studied in high school. Therefore, too near points may be
>> in the same potential well.
>> No more words, have a look at the attachment.
>> My coding is that of a beginner. I'm sure everybody would find more
>> efficient coding.  As a comment: I started studying Python around May 15th
>> 2023.
>> My best regards.
>> Ulderico Santarelli.
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
>
> *Jaime Lopez Carvajal*
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230917/172d85dd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230917/172d85dd/attachment-0001.png>

From ulderico.santarelli at gmail.com  Mon Sep 18 02:54:10 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Mon, 18 Sep 2023 08:54:10 +0200
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
 <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
Message-ID: <CAAbcUg4TwpgHdY7pzWkcGt4B7-gJkNMBJFWH-rBpzxwnbSo3Eg@mail.gmail.com>

*I think it better to send you the script in its integrity. I ran now and
it works. *
*about work it is*
work
array([[ 5.63011247],
       [-2.31453939],
       [22.23122848],
       [15.37678101]])
np.shape(work)
(4, 1)

*my best regards. *
*Ulderico.*
_________________________________________________________________________________
import numpy as np
import pandas as pd
dataraw = pd.read_excel("C:\Pyth\iris.xlsx")
#standardize data --- dataraw is a DataFrame
#locate data in the DataFrame
datar = dataraw.iloc[:,1:5]
means = datar.mean(axis = 0)
stdev = datar.std(axis = 0)
data = (datar-means)/stdev
#keep just quantitative variables
#CENTRALITY INDEX
scalar = pd.merge(data, data, how = 'cross')
point1 = scalar.loc[:, 'sepal length _x':'petal width _x']
point2 = scalar.loc[:, 'sepal length _y':'petal width _y']
apoint1 = point1.to_numpy(dtype = float)
apoint2 = point2.to_numpy(dtype = float)
delta = (apoint1 - apoint2)
force = 0
if delta.any() != 0:
    force = np.exp(-abs(delta))
sig = np.sign(delta)
sforce = sig*force
dsforce = pd.DataFrame(sforce)
#dsforce.to_excel('C:\Pyth\dsforce.xlsx')
arr = np.ones((150, 1),)
sforcet = sforce.T
sum_force =np.zeros((1, 4),)   #do not use empty arrays
start = 0
end = 150
for i in range(150):
    s_forcet = sforcet[:, start:end]
    work = np.matmul(s_forcet, arr)
    sum_force =np.concatenate((sum_force, work.reshape(1, 4)), axis = 0)
    start = end
    end +=150
sumforce = sum_force[1:, :]
dsumforce = pd.DataFrame(sumforce)
dsumforce.to_excel('C:\Pyth\sumforce_sqc.xlsx')
sum_force_square = sumforce**2
ssT = np.ones((4, 1),)
T_w_ = np.sqrt(np.matmul(sum_force_square, ssT))
dT_w_ = pd.DataFrame(T_w_, )
dT_w_.to_excel('C:\Pyth\T_w_.xlsx')

Il giorno dom 17 set 2023 alle ore 18:14 Jaime Lopez <jalopcar at gmail.com>
ha scritto:

> Hi there,
>
> I got interested in your project, but I found this error from the
> beginning (see attached image).
> The work array cannot be reshaped to (1,4), cause it has shape (2,1), any
> suggestions?
>
> JL
>
> [image: image.png]
>
> On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
> ulderico.santarelli at gmail.com> wrote:
>
>>       *I am an old guy who started programming around the seventies of
>> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
>> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
>> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
>> encompassing an advanced Object-Oriented Language and a very wide family of
>> packages, I decided to run an exercise about a problem I've been
>> tackling since my youth (have a look at the Bibliography). I succeeded in
>> completing it in a few days and I'm attaching my solution to the problem of
>> finding the points in a sample that are "central" in a surrounding
>> topological neighborhood. They are eligible as centroids for a Cluster
>> Analysis after the aggregation of "too near points'. The solution is based
>> on the search of potential wells in a suitable potential field, similar to
>> the one all of us studied in high school. Therefore, too near points may be
>> in the same potential well.
>> No more words, have a look at the attachment.
>> My coding is that of a beginner. I'm sure everybody would find more
>> efficient coding.  As a comment: I started studying Python around May 15th
>> 2023.
>> My best regards.
>> Ulderico Santarelli.
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
>
> *Jaime Lopez Carvajal*
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/e6446341/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/e6446341/attachment-0001.png>

From ulderico.santarelli at gmail.com  Mon Sep 18 06:16:49 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Mon, 18 Sep 2023 12:16:49 +0200
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
 <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
Message-ID: <CAAbcUg64MA9s9X=BJCiA4z2v5jcKY9KjC54pN1AU5Kz31ZX1AA@mail.gmail.com>

in addition, *the distance I'm using is not a dogma*. It is meant to avoid
the "black holes syndrome" that would emerge using the sheer Newtonian
distance when by chance two points are too near. When the distance is 0,
exp(-|w-x|) would be 1 and is set to 0. I tried also  exp{-|w-x|^2) but
changes are not significant.
Ulderico.

Il giorno dom 17 set 2023 alle ore 18:14 Jaime Lopez <jalopcar at gmail.com>
ha scritto:

> Hi there,
>
> I got interested in your project, but I found this error from the
> beginning (see attached image).
> The work array cannot be reshaped to (1,4), cause it has shape (2,1), any
> suggestions?
>
> JL
>
> [image: image.png]
>
> On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
> ulderico.santarelli at gmail.com> wrote:
>
>>       *I am an old guy who started programming around the seventies of
>> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
>> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
>> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
>> encompassing an advanced Object-Oriented Language and a very wide family of
>> packages, I decided to run an exercise about a problem I've been
>> tackling since my youth (have a look at the Bibliography). I succeeded in
>> completing it in a few days and I'm attaching my solution to the problem of
>> finding the points in a sample that are "central" in a surrounding
>> topological neighborhood. They are eligible as centroids for a Cluster
>> Analysis after the aggregation of "too near points'. The solution is based
>> on the search of potential wells in a suitable potential field, similar to
>> the one all of us studied in high school. Therefore, too near points may be
>> in the same potential well.
>> No more words, have a look at the attachment.
>> My coding is that of a beginner. I'm sure everybody would find more
>> efficient coding.  As a comment: I started studying Python around May 15th
>> 2023.
>> My best regards.
>> Ulderico Santarelli.
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
>
> *Jaime Lopez Carvajal*
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/1bd349ad/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/1bd349ad/attachment-0001.png>

From jalopcar at gmail.com  Mon Sep 18 12:07:56 2023
From: jalopcar at gmail.com (Jaime Lopez)
Date: Mon, 18 Sep 2023 11:07:56 -0500
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CAAbcUg4TwpgHdY7pzWkcGt4B7-gJkNMBJFWH-rBpzxwnbSo3Eg@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
 <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
 <CAAbcUg4TwpgHdY7pzWkcGt4B7-gJkNMBJFWH-rBpzxwnbSo3Eg@mail.gmail.com>
Message-ID: <CANsMHoMqnXw8NYF2XTQcP8D5ZtpvRxiVjArELrJm0v00n6qjoA@mail.gmail.com>

Hi,

Same error, maybe it could be related to the database I got from github
(iris.xlsx), could you share yours?.

[image: image.png]

JL

On Mon, Sep 18, 2023 at 1:57?AM Ulderico Santarelli <
ulderico.santarelli at gmail.com> wrote:

> *I think it better to send you the script in its integrity. I ran now and
> it works. *
> *about work it is*
> work
> array([[ 5.63011247],
>        [-2.31453939],
>        [22.23122848],
>        [15.37678101]])
> np.shape(work)
> (4, 1)
>
> *my best regards. *
> *Ulderico.*
>
> _________________________________________________________________________________
> import numpy as np
> import pandas as pd
> dataraw = pd.read_excel("C:\Pyth\iris.xlsx")
> #standardize data --- dataraw is a DataFrame
> #locate data in the DataFrame
> datar = dataraw.iloc[:,1:5]
> means = datar.mean(axis = 0)
> stdev = datar.std(axis = 0)
> data = (datar-means)/stdev
> #keep just quantitative variables
> #CENTRALITY INDEX
> scalar = pd.merge(data, data, how = 'cross')
> point1 = scalar.loc[:, 'sepal length _x':'petal width _x']
> point2 = scalar.loc[:, 'sepal length _y':'petal width _y']
> apoint1 = point1.to_numpy(dtype = float)
> apoint2 = point2.to_numpy(dtype = float)
> delta = (apoint1 - apoint2)
> force = 0
> if delta.any() != 0:
>     force = np.exp(-abs(delta))
> sig = np.sign(delta)
> sforce = sig*force
> dsforce = pd.DataFrame(sforce)
> #dsforce.to_excel('C:\Pyth\dsforce.xlsx')
> arr = np.ones((150, 1),)
> sforcet = sforce.T
> sum_force =np.zeros((1, 4),)   #do not use empty arrays
> start = 0
> end = 150
> for i in range(150):
>     s_forcet = sforcet[:, start:end]
>     work = np.matmul(s_forcet, arr)
>     sum_force =np.concatenate((sum_force, work.reshape(1, 4)), axis = 0)
>     start = end
>     end +=150
> sumforce = sum_force[1:, :]
> dsumforce = pd.DataFrame(sumforce)
> dsumforce.to_excel('C:\Pyth\sumforce_sqc.xlsx')
> sum_force_square = sumforce**2
> ssT = np.ones((4, 1),)
> T_w_ = np.sqrt(np.matmul(sum_force_square, ssT))
> dT_w_ = pd.DataFrame(T_w_, )
> dT_w_.to_excel('C:\Pyth\T_w_.xlsx')
>
> Il giorno dom 17 set 2023 alle ore 18:14 Jaime Lopez <jalopcar at gmail.com>
> ha scritto:
>
>> Hi there,
>>
>> I got interested in your project, but I found this error from the
>> beginning (see attached image).
>> The work array cannot be reshaped to (1,4), cause it has shape (2,1), any
>> suggestions?
>>
>> JL
>>
>> [image: image.png]
>>
>> On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
>> ulderico.santarelli at gmail.com> wrote:
>>
>>>       *I am an old guy who started programming around the seventies of
>>> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
>>> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
>>> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
>>> encompassing an advanced Object-Oriented Language and a very wide family of
>>> packages, I decided to run an exercise about a problem I've been
>>> tackling since my youth (have a look at the Bibliography). I succeeded in
>>> completing it in a few days and I'm attaching my solution to the problem of
>>> finding the points in a sample that are "central" in a surrounding
>>> topological neighborhood. They are eligible as centroids for a Cluster
>>> Analysis after the aggregation of "too near points'. The solution is based
>>> on the search of potential wells in a suitable potential field, similar to
>>> the one all of us studied in high school. Therefore, too near points may be
>>> in the same potential well.
>>> No more words, have a look at the attachment.
>>> My coding is that of a beginner. I'm sure everybody would find more
>>> efficient coding.  As a comment: I started studying Python around May 15th
>>> 2023.
>>> My best regards.
>>> Ulderico Santarelli.
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>>
>> *Jaime Lopez Carvajal*
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 

*Jaime Lopez Carvajal*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/4490887c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/4490887c/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 22647 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/4490887c/attachment-0003.png>

From ulderico.santarelli at gmail.com  Mon Sep 18 12:14:20 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Mon, 18 Sep 2023 18:14:20 +0200
Subject: [scikit-learn] CLUSTER ANALYSIS AND THE SEARCH OF A SAMPLE MODE
In-Reply-To: <CANsMHoMqnXw8NYF2XTQcP8D5ZtpvRxiVjArELrJm0v00n6qjoA@mail.gmail.com>
References: <CAAbcUg43-8ug_gAsf1eEwvhfe8Lu+wGPXvkeswHywqUksUEY0g@mail.gmail.com>
 <CANsMHoO0dMW_B7JQQFGi-w9WeBNpSk4GLPD8FD+pmuEEmPX64Q@mail.gmail.com>
 <CAAbcUg4TwpgHdY7pzWkcGt4B7-gJkNMBJFWH-rBpzxwnbSo3Eg@mail.gmail.com>
 <CANsMHoMqnXw8NYF2XTQcP8D5ZtpvRxiVjArELrJm0v00n6qjoA@mail.gmail.com>
Message-ID: <CAAbcUg68tP56moRhLXQs3poqWWeCoTQXnTxT31mRAJXGcBrwUA@mail.gmail.com>

of course. Here it is

Il giorno lun 18 set 2023 alle ore 18:10 Jaime Lopez <jalopcar at gmail.com>
ha scritto:

> Hi,
>
> Same error, maybe it could be related to the database I got from github
> (iris.xlsx), could you share yours?.
>
> [image: image.png]
>
> JL
>
> On Mon, Sep 18, 2023 at 1:57?AM Ulderico Santarelli <
> ulderico.santarelli at gmail.com> wrote:
>
>> *I think it better to send you the script in its integrity. I ran now and
>> it works. *
>> *about work it is*
>> work
>> array([[ 5.63011247],
>>        [-2.31453939],
>>        [22.23122848],
>>        [15.37678101]])
>> np.shape(work)
>> (4, 1)
>>
>> *my best regards. *
>> *Ulderico.*
>>
>> _________________________________________________________________________________
>> import numpy as np
>> import pandas as pd
>> dataraw = pd.read_excel("C:\Pyth\iris.xlsx")
>> #standardize data --- dataraw is a DataFrame
>> #locate data in the DataFrame
>> datar = dataraw.iloc[:,1:5]
>> means = datar.mean(axis = 0)
>> stdev = datar.std(axis = 0)
>> data = (datar-means)/stdev
>> #keep just quantitative variables
>> #CENTRALITY INDEX
>> scalar = pd.merge(data, data, how = 'cross')
>> point1 = scalar.loc[:, 'sepal length _x':'petal width _x']
>> point2 = scalar.loc[:, 'sepal length _y':'petal width _y']
>> apoint1 = point1.to_numpy(dtype = float)
>> apoint2 = point2.to_numpy(dtype = float)
>> delta = (apoint1 - apoint2)
>> force = 0
>> if delta.any() != 0:
>>     force = np.exp(-abs(delta))
>> sig = np.sign(delta)
>> sforce = sig*force
>> dsforce = pd.DataFrame(sforce)
>> #dsforce.to_excel('C:\Pyth\dsforce.xlsx')
>> arr = np.ones((150, 1),)
>> sforcet = sforce.T
>> sum_force =np.zeros((1, 4),)   #do not use empty arrays
>> start = 0
>> end = 150
>> for i in range(150):
>>     s_forcet = sforcet[:, start:end]
>>     work = np.matmul(s_forcet, arr)
>>     sum_force =np.concatenate((sum_force, work.reshape(1, 4)), axis = 0)
>>     start = end
>>     end +=150
>> sumforce = sum_force[1:, :]
>> dsumforce = pd.DataFrame(sumforce)
>> dsumforce.to_excel('C:\Pyth\sumforce_sqc.xlsx')
>> sum_force_square = sumforce**2
>> ssT = np.ones((4, 1),)
>> T_w_ = np.sqrt(np.matmul(sum_force_square, ssT))
>> dT_w_ = pd.DataFrame(T_w_, )
>> dT_w_.to_excel('C:\Pyth\T_w_.xlsx')
>>
>> Il giorno dom 17 set 2023 alle ore 18:14 Jaime Lopez <jalopcar at gmail.com>
>> ha scritto:
>>
>>> Hi there,
>>>
>>> I got interested in your project, but I found this error from the
>>> beginning (see attached image).
>>> The work array cannot be reshaped to (1,4), cause it has shape (2,1),
>>> any suggestions?
>>>
>>> JL
>>>
>>> [image: image.png]
>>>
>>> On Thu, Sep 14, 2023 at 11:29?AM Ulderico Santarelli <
>>> ulderico.santarelli at gmail.com> wrote:
>>>
>>>>       *I am an old guy who started programming around the seventies of
>>>> the last century* with ASSEMBLER 360, then FORTRAN, PL1, APL, IBM
>>>> APPLICATION SYSTEM and, last, the marvelous SAS. Having heard around about
>>>> the powerful, flexible, functionally complete PYTHON UNIVERSE?,
>>>> encompassing an advanced Object-Oriented Language and a very wide family of
>>>> packages, I decided to run an exercise about a problem I've been
>>>> tackling since my youth (have a look at the Bibliography). I succeeded in
>>>> completing it in a few days and I'm attaching my solution to the problem of
>>>> finding the points in a sample that are "central" in a surrounding
>>>> topological neighborhood. They are eligible as centroids for a Cluster
>>>> Analysis after the aggregation of "too near points'. The solution is based
>>>> on the search of potential wells in a suitable potential field, similar to
>>>> the one all of us studied in high school. Therefore, too near points may be
>>>> in the same potential well.
>>>> No more words, have a look at the attachment.
>>>> My coding is that of a beginner. I'm sure everybody would find more
>>>> efficient coding.  As a comment: I started studying Python around May 15th
>>>> 2023.
>>>> My best regards.
>>>> Ulderico Santarelli.
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>> --
>>>
>>> *Jaime Lopez Carvajal*
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
>
> *Jaime Lopez Carvajal*
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/b1e9f4ac/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 53353 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/b1e9f4ac/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 22647 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/b1e9f4ac/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iris.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 14808 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230918/b1e9f4ac/attachment-0001.xlsx>

From g.lemaitre58 at gmail.com  Thu Sep 21 03:54:52 2023
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Thu, 21 Sep 2023 09:54:52 +0200
Subject: [scikit-learn] [ANN] scikit-learn 1.3.1 is online!
Message-ID: <CACDxx9i_MHaXSwH+WJtiz_tksUy=C9TqCSH0ioX_vV486+H=BA@mail.gmail.com>

scikit-learn 1.3.1 is out on pypi.org and conda-forge!
This is a maintenance release that fixes several regressions introduced in
version 1.3
<https://scikit-learn.org/stable/whats_new/v1.2.html#version-1-2-1>
https://scikit-learn.org/
<https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1>
stable/whats_new/v1.3.html#
<https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1>
version-1-3-1
<https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1>

You can upgrade with pip as usual:

pip install -U scikit-learn

The conda-forge builds will be available shortly, which you can then
install using:

conda install -c conda-forge scikit-learn


Thanks to all contributors who helped on this release.
Guillaume,
On the behalf of the Scikit-learn maintainers team.
-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230921/6cf12749/attachment.html>

From dalibor.hrg at gmail.com  Sun Sep 24 05:10:23 2023
From: dalibor.hrg at gmail.com (Dalibor Hrg)
Date: Sun, 24 Sep 2023 11:10:23 +0200
Subject: [scikit-learn] Request / Proposal: integrating IEEE paper in
 scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark
 datasets
Message-ID: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>

Dear scikit-learn mailing list

similarly to standing feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE
and RFECV*, this is a request to openly discuss the *PROPOSAL* and
requirements of *feature_selection.EFS and/or EFSCV* which would stand for
"Evolutionary Feature Selection" with starting 8 algorithms or methods to
be used with scikit-learn estimators, just as published in IEEE
https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to
help integrate it (in cc).

*PROPOSAL*
Implement/integrate https://arxiv.org/abs/2303.10182 paper into
scikit-learn:

*1) CODE*

   - implementing *feature_selection.EFS and/or EFSC*V (a space for
   evolutionary computing community interested in feature selection)

RFE is:

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
*RFE*(estimator, *[, ...])

Feature ranking with recursive feature elimination.

feature_selection.RFECV
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
(estimator, *[, ...])

Recursive feature elimination with cross-validation to select features.
 The "EFS" could be:

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
*EFS*(estimator, *[, ...])

Feature ranking and feature elimination with *8 different algorithms, SFE,
SFE-PSO* etc. *<- new algorithms could be added and benchmarked with
evolutionary computing, swarm, genetic etc. *

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
*EFSCV*(estimator, *[, ...])

Feature elimination with cross-validation to select features

*2) DATASETS & CANCER BENCHMARK*

   - curating and integrating fetch of *cancer_benchmark* 40 datasets,
   directly in scikit-learn or externally pullable somehow and maintained
   (space for contributing expanding high-dimensional datasets on cancer
   topics).

fetch_c
<https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing>
ancer-benchmark(*[,, ...])

Loads 40 individual cancer related high-dimensional datasets for
benchmarking feature selection methods (classification).

*3) TUTORIAL / WEBSITE*

   - writing tutorial to replicate IEEE paper results with
*feature_selection.EFS
   and/or EFSCV* on *cancer_benchmark (40 datasets)*


I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very
interesting novelty in working with high-dimensional datasets as it reports
small subsets of predictive features selected with SVM, KNN across 40
datasets. Replicability under BSD-3 and high quality under scikit-learn
could assure benchmarking novel feature selection algorithms easier - in my
very first opinion. Since this is the very first touch of myself with IEEE
paper authors and the scikit-learn list altogether, we would welcome some
help/guide how integration could work out, and if there is any interest on
that line at all.

Kind regards
Dalibor Hrg
https://www.linkedin.com/in/daliborhrg/


On Sat, Sep 23, 2023 at 9:08?AM Alexandre Gramfort <
alexandre.gramfort at inria.fr> wrote:

> Dear Dalibor
>
> you should discuss this on the main scikit-learn mailing list.
>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> Alex
>
> On Fri, Sep 22, 2023 at 12:19?PM Dalibor Hrg <dalibor.hrg at gmail.com>
> wrote:
>
>> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>>
>> This is a request to openly discuss the idea of potential for
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS* which would stand for "Evolutionary Feature Selection" or shortly
>> EFS with starting 8 algorithms as published in IEEE
>> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional
>> datasets. I have identified this work to be of very interesting novelty in
>> working with high-dimensional datasets, especially for health fields, and
>> it could mean a lot to the ML community and scikit-learn project - in my
>> very first opinion.
>>
>> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
>> paper/work as feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS *and 8 algorithms in it could be a near term goal. And eventually,
>> scikit-learn EFSCV and diverse classification algorithms could be
>> benchmarked for "joint paper" in JOSS, or a health journal.
>>
>> My initial idea (doesn't need to be that way or is open to discussion)
>> has some first thought like this:
>>
>> RFE has:
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *RFE*(estimator, *[, ...])
>>
>> Feature ranking with recursive feature elimination.
>>
>> feature_selection.RFECV
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>> (estimator, *[, ...])
>>
>> Recursive feature elimination with cross-validation to select features.
>>  The "EFS" could have:
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS*(estimator, *[, ...])
>>
>> Feature ranking and feature elimination with *8 different algorithms,
>> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked
>> with evolutionary computing, swarm, genetic etc. *
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>> *EFSCV*(estimator, *[, ...])
>>
>> Feature elimination with cross-validation to select features
>> Looking forward to an open discussion and if Evolutionary Feature
>> Selection EFS is something for sklearn project, or maybe a separate pip
>> install package.
>>
>> Kind regards
>> Dalibor Hrg
>> https://www.linkedin.com/in/daliborhrg/
>>
>> On Fri, Sep 22, 2023 at 10:50?AM Behrooz Ahadzade <b.ahadzade at yahoo.com>
>> wrote:
>>
>>>
>>>
>>> Dear Dalibor Hrg,
>>>
>>> Thank you very much for your attention to the SFE algorithm. Thank you
>>> very much for the time you took to guide me and my colleagues. According to
>>> your guidance, we will add this algorithm to the scikit-learn library as
>>> soon as possible.
>>>
>>> Kind regards,
>>> Ahadzadeh.
>>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg <
>>> dalibor.hrg at gmail.com> wrote:
>>>
>>>
>>> Dear Authors,
>>>
>>> you have done some amazing work on feature selection here published in
>>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code
>>> here without a LICENSE file or any info on this:
>>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are
>>> mentioned to download data.
>>>
>>> I would be interested with you that we:
>>>
>>> Step 1) make and release a pip package, publish this code in JOSS
>>> https://joss.readthedocs.io i.e.
>>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license
>>> and replicate IEEE paper table results. All 8 algorithms could be in
>>> potentially one class "EFS" meaning "Evolutionary Feature Selection",
>>> selectable as 8 options among them SFE. Or something like that.
>>>
>>> Step 2) try integrate and work with scikit-learn people, I would
>>> recommend it to integrate this under
>>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection similarly
>>> to sklearn.feature_selection.RFE. I believe this would be a great
>>> contribution to the best open library for ML, scikit-learn.
>>>
>>> I am unsure what is the status of datasets and licenses therein?. But,
>>> the datasets could be fetched externally from OpenML.org repository, for
>>> example
>>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or
>>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit
>>> on the dataset licenses?
>>>
>>> Overall, I hope this can hugely maximize your published work visibility
>>> but also for others to credit you in papers in a more citable and
>>> replicable way. I believe your IEEE paper and work definitely deserve a
>>> spot in scikit-learn. There is need for some replicable code on
>>> "Evolutionary Methods for Feature Selection" and such Benchmark in
>>> life-science datasets, and you have done some great work so far.
>>>
>>> Let me know what you think.
>>>
>>> Best regards,
>>> Dalibor Hrg
>>>
>>> https://www.linkedin.com/in/daliborhrg/
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230924/58bb0ea0/attachment-0001.html>

From ulderico.santarelli at gmail.com  Sun Sep 24 13:56:40 2023
From: ulderico.santarelli at gmail.com (Ulderico Santarelli)
Date: Sun, 24 Sep 2023 19:56:40 +0200
Subject: [scikit-learn] Request / Proposal: integrating IEEE paper in
 scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark
 datasets
In-Reply-To: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
References: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
Message-ID: <CAAbcUg74qK5RL3Se+XiJBO5CNL3A9AtAu=nEXwUDNKCMx29Pcg@mail.gmail.com>

starting with the Efroymson stepwise regression, the selection of relevant
regressors has a long history. Of course, Efroymson's case is an old and
simple one in a very wide set of more general problems where the number of
variables and the missingness pattern make things very hard to tackle.
I had a look at the paper that seems to me to be based on a wide review of
the literature and an in depth focus on the main extant algorithms. I do
not feel as an expert about the matter. However, the subject is so
important that, in view of the thorough analysis the authors performed, I
think this enterprise worthwhile.
My best regards. Ulderico Santarelli.

Il giorno dom 24 set 2023 alle ore 11:12 Dalibor Hrg <dalibor.hrg at gmail.com>
ha scritto:

> Dear scikit-learn mailing list
>
> similarly to standing feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE
> and RFECV*, this is a request to openly discuss the *PROPOSAL* and
> requirements of *feature_selection.EFS and/or EFSCV* which would stand
> for "Evolutionary Feature Selection" with starting 8 algorithms or methods
> to be used with scikit-learn estimators, just as published in IEEE
> https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to
> help integrate it (in cc).
>
> *PROPOSAL*
> Implement/integrate https://arxiv.org/abs/2303.10182 paper into
> scikit-learn:
>
> *1) CODE*
>
>    - implementing *feature_selection.EFS and/or EFSC*V (a space for
>    evolutionary computing community interested in feature selection)
>
> RFE is:
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
> *RFE*(estimator, *[, ...])
>
> Feature ranking with recursive feature elimination.
>
> feature_selection.RFECV
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
> (estimator, *[, ...])
>
> Recursive feature elimination with cross-validation to select features.
>  The "EFS" could be:
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
> *EFS*(estimator, *[, ...])
>
> Feature ranking and feature elimination with *8 different algorithms,
> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with
> evolutionary computing, swarm, genetic etc. *
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
> *EFSCV*(estimator, *[, ...])
>
> Feature elimination with cross-validation to select features
>
> *2) DATASETS & CANCER BENCHMARK*
>
>    - curating and integrating fetch of *cancer_benchmark* 40 datasets,
>    directly in scikit-learn or externally pullable somehow and maintained
>    (space for contributing expanding high-dimensional datasets on cancer
>    topics).
>
> fetch_c
> <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing>
> ancer-benchmark(*[,, ...])
>
> Loads 40 individual cancer related high-dimensional datasets for
> benchmarking feature selection methods (classification).
>
> *3) TUTORIAL / WEBSITE*
>
>    - writing tutorial to replicate IEEE paper results with *feature_selection.EFS
>    and/or EFSCV* on *cancer_benchmark (40 datasets)*
>
>
> I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of
> very interesting novelty in working with high-dimensional datasets as it
> reports small subsets of predictive features selected with SVM, KNN across
> 40 datasets. Replicability under BSD-3 and high quality under scikit-learn
> could assure benchmarking novel feature selection algorithms easier - in my
> very first opinion. Since this is the very first touch of myself with IEEE
> paper authors and the scikit-learn list altogether, we would welcome some
> help/guide how integration could work out, and if there is any interest on
> that line at all.
>
> Kind regards
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
>
>
> On Sat, Sep 23, 2023 at 9:08?AM Alexandre Gramfort <
> alexandre.gramfort at inria.fr> wrote:
>
>> Dear Dalibor
>>
>> you should discuss this on the main scikit-learn mailing list.
>>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> Alex
>>
>> On Fri, Sep 22, 2023 at 12:19?PM Dalibor Hrg <dalibor.hrg at gmail.com>
>> wrote:
>>
>>> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>>>
>>> This is a request to openly discuss the idea of potential for
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS* which would stand for "Evolutionary Feature Selection" or shortly
>>> EFS with starting 8 algorithms as published in IEEE
>>> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional
>>> datasets. I have identified this work to be of very interesting novelty in
>>> working with high-dimensional datasets, especially for health fields, and
>>> it could mean a lot to the ML community and scikit-learn project - in my
>>> very first opinion.
>>>
>>> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
>>> paper/work as feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS *and 8 algorithms in it could be a near term goal. And eventually,
>>> scikit-learn EFSCV and diverse classification algorithms could be
>>> benchmarked for "joint paper" in JOSS, or a health journal.
>>>
>>> My initial idea (doesn't need to be that way or is open to discussion)
>>> has some first thought like this:
>>>
>>> RFE has:
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *RFE*(estimator, *[, ...])
>>>
>>> Feature ranking with recursive feature elimination.
>>>
>>> feature_selection.RFECV
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>>> (estimator, *[, ...])
>>>
>>> Recursive feature elimination with cross-validation to select features.
>>>  The "EFS" could have:
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS*(estimator, *[, ...])
>>>
>>> Feature ranking and feature elimination with *8 different algorithms,
>>> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked
>>> with evolutionary computing, swarm, genetic etc. *
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>>> *EFSCV*(estimator, *[, ...])
>>>
>>> Feature elimination with cross-validation to select features
>>> Looking forward to an open discussion and if Evolutionary Feature
>>> Selection EFS is something for sklearn project, or maybe a separate pip
>>> install package.
>>>
>>> Kind regards
>>> Dalibor Hrg
>>> https://www.linkedin.com/in/daliborhrg/
>>>
>>> On Fri, Sep 22, 2023 at 10:50?AM Behrooz Ahadzade <b.ahadzade at yahoo.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> Dear Dalibor Hrg,
>>>>
>>>> Thank you very much for your attention to the SFE algorithm. Thank you
>>>> very much for the time you took to guide me and my colleagues. According to
>>>> your guidance, we will add this algorithm to the scikit-learn library as
>>>> soon as possible.
>>>>
>>>> Kind regards,
>>>> Ahadzadeh.
>>>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg <
>>>> dalibor.hrg at gmail.com> wrote:
>>>>
>>>>
>>>> Dear Authors,
>>>>
>>>> you have done some amazing work on feature selection here published in
>>>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code
>>>> here without a LICENSE file or any info on this:
>>>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are
>>>> mentioned to download data.
>>>>
>>>> I would be interested with you that we:
>>>>
>>>> Step 1) make and release a pip package, publish this code in JOSS
>>>> https://joss.readthedocs.io i.e.
>>>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license
>>>> and replicate IEEE paper table results. All 8 algorithms could be in
>>>> potentially one class "EFS" meaning "Evolutionary Feature Selection",
>>>> selectable as 8 options among them SFE. Or something like that.
>>>>
>>>> Step 2) try integrate and work with scikit-learn people, I would
>>>> recommend it to integrate this under
>>>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection similarly
>>>> to sklearn.feature_selection.RFE. I believe this would be a great
>>>> contribution to the best open library for ML, scikit-learn.
>>>>
>>>> I am unsure what is the status of datasets and licenses therein?. But,
>>>> the datasets could be fetched externally from OpenML.org repository, for
>>>> example
>>>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or
>>>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit
>>>> on the dataset licenses?
>>>>
>>>> Overall, I hope this can hugely maximize your published work visibility
>>>> but also for others to credit you in papers in a more citable and
>>>> replicable way. I believe your IEEE paper and work definitely deserve a
>>>> spot in scikit-learn. There is need for some replicable code on
>>>> "Evolutionary Methods for Feature Selection" and such Benchmark in
>>>> life-science datasets, and you have done some great work so far.
>>>>
>>>> Let me know what you think.
>>>>
>>>> Best regards,
>>>> Dalibor Hrg
>>>>
>>>> https://www.linkedin.com/in/daliborhrg/
>>>>
>>> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230924/cad51390/attachment-0001.html>

From gael.varoquaux at normalesup.org  Sun Sep 24 14:39:53 2023
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 24 Sep 2023 20:39:53 +0200
Subject: [scikit-learn] Request / Proposal: integrating IEEE paper in
 scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark
 datasets
In-Reply-To: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
References: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
Message-ID: <20230924183953.47lkn3nwj544wumm@gaellaptop>

Dear Dalibor,

As detailed in the FAQ,
https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
"""
We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness.
"""

These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms.

Best,

Ga?l

On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
> Dear scikit-learn mailing list

> similarly to standing?feature_selection.RFE and RFECV, this is a request to
> openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or
> EFSCV?which would stand for "Evolutionary Feature Selection" with starting 8
> algorithms or methods to be used with scikit-learn estimators, just as
> published in IEEE?https://arxiv.org/abs/2303.10182?by the authors of paper.
> They agreed to help integrate it (in cc).

> PROPOSAL
> Implement/integrate?https://arxiv.org/abs/2303.10182?paper into scikit-learn:?

> 1) CODE

>   ? implementing?feature_selection.EFS and/or?EFSCV (a space for evolutionary
>     computing community interested in feature selection)

> RFE is:

> feature_selection.RFE          Feature ranking with recursive feature
> (estimator,?*[,?...])          elimination.

> feature_selection.RFECV        Recursive feature elimination with
> (estimator,?*[,?...])          cross-validation to select features.

> ?The "EFS" could be:

>                         Feature ranking and feature elimination with 8
> feature_selection.EFS   different algorithms, SFE, SFE-PSO etc. <- new
> (estimator,?*[,?...])   algorithms could be added and benchmarked with
>                         evolutionary computing, swarm, genetic etc.

> feature_selection.EFSCV Feature elimination with cross-validation to select
> (estimator,?*[,?...])   features


> 2) DATASETS & CANCER BENCHMARK

>   ? curating?and integrating fetch of?cancer_benchmark?40 datasets, directly in
>     scikit-learn or externally pullable somehow and maintained (space for
>     contributing expanding high-dimensional datasets on cancer topics).??

> fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional
> (*[,,?...])            datasets for benchmarking feature selection methods
>                        (classification).


> 3) TUTORIAL / WEBSITE

>   ? writing tutorial to replicate IEEE paper results with?feature_selection.EFS
>     and/or EFSCV?on?cancer_benchmark (40 datasets)


> I have identified IEEE work?https://arxiv.org/abs/2303.10182?to be of very
> interesting novelty in working with high-dimensional datasets as it reports
> small?subsets of predictive features selected with SVM, KNN across 40 datasets.
> Replicability under BSD-3 and high quality under scikit-learn could assure
> benchmarking novel feature selection algorithms easier - in my very first
> opinion. Since this is the very first touch of myself with IEEE paper authors
> and the scikit-learn list altogether, we would welcome some help/guide
> how?integration could?work out, and if there is any interest on that line at
> all.??

> Kind regards
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
> ? ??

> On Sat, Sep 23, 2023 at 9:08?AM Alexandre Gramfort <alexandre.gramfort at inria.fr
> > wrote:

>     Dear Dalibor

>     you should discuss this on the?main scikit-learn mailing?list.

>     https://mail.python.org/mailman/listinfo/scikit-learn

>     Alex

>     On Fri, Sep 22, 2023 at 12:19?PM Dalibor Hrg <dalibor.hrg at gmail.com> wrote:

>         Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),

>         This is a request to openly discuss the idea?of potential for?
>         feature_selection.EFS?which would stand for "Evolutionary Feature
>         Selection" or shortly EFS with starting 8 algorithms as published in
>         IEEE?https://arxiv.org/abs/2303.10182?by the authors on
>         high-dimensional datasets. I have identified this work to be of very
>         interesting novelty in working?with high-dimensional datasets,
>         especially for health fields, and it could mean a lot to the ML
>         community and scikit-learn project?- in my very first opinion.? ?

>         A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
>         paper/work as?feature_selection.EFS?and 8 algorithms in it could be a
>         near term goal. And eventually, scikit-learn EFSCV and diverse
>         classification algorithms could be benchmarked for "joint paper" in
>         JOSS, or a health journal.? ? ?

>         My initial idea (doesn't need to be that way or is open to discussion)
>         has some first thought like this:?
>         ?
>         RFE has:

>         feature_selection.RFE       Feature ranking with recursive feature
>         (estimator,?*[,?...])       elimination.

>         feature_selection.RFECV     Recursive feature elimination with
>         (estimator,?*[,?...])       cross-validation to select features.

>         ?The "EFS" could have:

>                                 Feature ranking and feature elimination with 8
>         feature_selection.EFS   different algorithms, SFE, SFE-PSO etc. <- new
>         (estimator,?*[,?...])   algorithms could be added and benchmarked with
>                                 evolutionary computing, swarm, genetic etc.

>         feature_selection.EFSCV Feature elimination with cross-validation to
>         (estimator,?*[,?...])   select features

>         Looking forward to an open discussion and if Evolutionary Feature
>         Selection EFS is something for sklearn?project, or maybe a separate pip
>         install package.?

>         Kind regards
>         Dalibor Hrg
>         https://www.linkedin.com/in/daliborhrg/

>         On Fri, Sep 22, 2023 at 10:50?AM Behrooz Ahadzade <b.ahadzade at yahoo.com
>         > wrote:


>             Dear Dalibor Hrg,

>             Thank you very much for your attention to the SFE algorithm. Thank
>             you very much for the time you took to guide me and my colleagues.
>             According to your guidance, we will add this algorithm to the
>             scikit-learn library as soon as possible.

>             Kind regards,
>             Ahadzadeh.
>             On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor
>             Hrg <dalibor.hrg at gmail.com> wrote:


>             Dear Authors,

>             you have done some amazing?work on feature selection here published
>             in IEEE:?https://arxiv.org/abs/2303.10182?. I have noticed Python
>             code here without a LICENSE file or any info on this:?https://
>             github.com/Ahadzadeh2022/SFE?and in the paper some links are
>             mentioned to download data.

>             I would be interested with you that?we:

>             Step 1) make and release a pip package, publish this code in JOSS?
>             https://joss.readthedocs.io?i.e.?https://joss.theoj.org/papers/
>             10.21105/joss.04611?under BSD-3 license and replicate IEEE paper
>             table results. All 8 algorithms could be in potentially one class
>             "EFS" meaning "Evolutionary Feature Selection", selectable as 8
>             options among them SFE. Or something like that.??
>             ??
>             Step 2) try integrate and work with scikit-learn people, I would
>             recommend it to integrate this under?https://scikit-learn.org/
>             stable/modules/classes.html#module-sklearn.feature_selection
>             ?similarly to sklearn.feature_selection.RFE. I believe this would
>             be a great contribution to the best open library for ML,
>             scikit-learn.?

>             I am unsure what is the status of datasets and licenses therein?.
>             But, the datasets could be fetched externally from OpenML.org
>             repository, for example?https://scikit-learn.org/stable/datasets/
>             loading_other_datasets.html?or CERN Zenodo where "benchmark
>             datasets" could be expanded. It depends a bit on the dataset
>             licenses??

>             Overall, I hope this can hugely maximize your published work
>             visibility but also for others to credit you in papers in a more
>             citable and replicable way. I believe your IEEE paper and work
>             definitely?deserve a spot in scikit-learn. There is need for some
>             replicable code on "Evolutionary Methods for Feature Selection" and
>             such Benchmark in life-science datasets, and you have done some
>             great work so far.

>             Let me know what you think.?

>             Best regards,
>             Dalibor Hrg

>             https://www.linkedin.com/in/daliborhrg/


> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


-- 
    Gael Varoquaux
    Research Director, INRIA
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From dalibor.hrg at gmail.com  Sat Sep 23 21:29:37 2023
From: dalibor.hrg at gmail.com (Dalibor Hrg)
Date: Sun, 24 Sep 2023 03:29:37 +0200
Subject: [scikit-learn] Request / Proposal: integrating IEEE paper in
 scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark
 datasets
In-Reply-To: <20230924183953.47lkn3nwj544wumm@gaellaptop>
References: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
 <20230924183953.47lkn3nwj544wumm@gaellaptop>
Message-ID: <CAJ=aRPrJuaRWz=BoJaFuj18djnyJCejqSde5n=ap-GT4zmZxbA@mail.gmail.com>

Dear Gael,

Thanks for clarification. Yes, I see, there is a need for more broad use of
evidence and citations of such methods or approaches. This is somehow what
I was thinking.

By looking here at sister projects
https://scikit-learn.org/stable/related_projects.html#related-projects or
especially package "Boruta"
https://github.com/scikit-learn-contrib/boruta_py, small question for a
hint: do you think such a pip package as Boruta could be closest fit by
implementing it with the cancer benchmark dataset, and replicating the
paper results?

Certainly, potential is to benchmark and publish on RFE and EFS how they go
along the benchmark, and demonstrate on diverse high-dimensional datasets
coming from other domains by other publications. Doing that is a long term
journey to show the usefulness of the method/algorithm.

Best,
Dalibor


On Sun, Sep 24, 2023, 21:37 Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> Dear Dalibor,
>
> As detailed in the FAQ,
>
> https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
> """
> We only consider well-established algorithms for inclusion. A rule of
> thumb is at least 3 years since publication, 200+ citations, and wide use
> and usefulness.
> """
>
> These days, I would say that the bar is even harder, as we are finding
> that we prioritize things such as high-quality documentation or better
> dataframe support to new algorithms.
>
> Best,
>
> Ga?l
>
> On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
> > Dear scikit-learn mailing list
>
> > similarly to standing feature_selection.RFE and RFECV, this is a request
> to
> > openly discuss the PROPOSAL and requirements of feature_selection.EFS
> and/or
> > EFSCV which would stand for "Evolutionary Feature Selection" with
> starting 8
> > algorithms or methods to be used with scikit-learn estimators, just as
> > published in IEEE https://arxiv.org/abs/2303.10182 by the authors of
> paper.
> > They agreed to help integrate it (in cc).
>
> > PROPOSAL
> > Implement/integrate https://arxiv.org/abs/2303.10182 paper into
> scikit-learn:
>
> > 1) CODE
>
> >   ? implementing feature_selection.EFS and/or EFSCV (a space for
> evolutionary
> >     computing community interested in feature selection)
>
> > RFE is:
>
> > feature_selection.RFE          Feature ranking with recursive feature
> > (estimator, *[, ...])          elimination.
>
> > feature_selection.RFECV        Recursive feature elimination with
> > (estimator, *[, ...])          cross-validation to select features.
>
> >  The "EFS" could be:
>
> >                         Feature ranking and feature elimination with 8
> > feature_selection.EFS   different algorithms, SFE, SFE-PSO etc. <- new
> > (estimator, *[, ...])   algorithms could be added and benchmarked with
> >                         evolutionary computing, swarm, genetic etc.
>
> > feature_selection.EFSCV Feature elimination with cross-validation to
> select
> > (estimator, *[, ...])   features
>
>
> > 2) DATASETS & CANCER BENCHMARK
>
> >   ? curating and integrating fetch of cancer_benchmark 40 datasets,
> directly in
> >     scikit-learn or externally pullable somehow and maintained (space for
> >     contributing expanding high-dimensional datasets on cancer topics).
>
> > fetch_cancer-benchmark Loads 40 individual cancer related
> high-dimensional
> > (*[,, ...])            datasets for benchmarking feature selection
> methods
> >                        (classification).
>
>
> > 3) TUTORIAL / WEBSITE
>
> >   ? writing tutorial to replicate IEEE paper results
> with feature_selection.EFS
> >     and/or EFSCV on cancer_benchmark (40 datasets)
>
>
> > I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of
> very
> > interesting novelty in working with high-dimensional datasets as it
> reports
> > small subsets of predictive features selected with SVM, KNN across 40
> datasets.
> > Replicability under BSD-3 and high quality under scikit-learn could
> assure
> > benchmarking novel feature selection algorithms easier - in my very first
> > opinion. Since this is the very first touch of myself with IEEE paper
> authors
> > and the scikit-learn list altogether, we would welcome some help/guide
> > how integration could work out, and if there is any interest on that
> line at
> > all.
>
> > Kind regards
> > Dalibor Hrg
> > https://www.linkedin.com/in/daliborhrg/
> >
>
> > On Sat, Sep 23, 2023 at 9:08?AM Alexandre Gramfort <
> alexandre.gramfort at inria.fr
> > > wrote:
>
> >     Dear Dalibor
>
> >     you should discuss this on the main scikit-learn mailing list.
>
> >     https://mail.python.org/mailman/listinfo/scikit-learn
>
> >     Alex
>
> >     On Fri, Sep 22, 2023 at 12:19?PM Dalibor Hrg <dalibor.hrg at gmail.com>
> wrote:
>
> >         Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>
> >         This is a request to openly discuss the idea of potential for
> >         feature_selection.EFS which would stand for "Evolutionary Feature
> >         Selection" or shortly EFS with starting 8 algorithms as
> published in
> >         IEEE https://arxiv.org/abs/2303.10182 by the authors on
> >         high-dimensional datasets. I have identified this work to be of
> very
> >         interesting novelty in working with high-dimensional datasets,
> >         especially for health fields, and it could mean a lot to the ML
> >         community and scikit-learn project - in my very first opinion.
>
> >         A Jupyter Notebook and scikit-learn tutorial replicating this
> IEEE
> >         paper/work as feature_selection.EFS and 8 algorithms in it could
> be a
> >         near term goal. And eventually, scikit-learn EFSCV and diverse
> >         classification algorithms could be benchmarked for "joint paper"
> in
> >         JOSS, or a health journal.
>
> >         My initial idea (doesn't need to be that way or is open to
> discussion)
> >         has some first thought like this:
> >
> >         RFE has:
>
> >         feature_selection.RFE       Feature ranking with recursive
> feature
> >         (estimator, *[, ...])       elimination.
>
> >         feature_selection.RFECV     Recursive feature elimination with
> >         (estimator, *[, ...])       cross-validation to select features.
>
> >          The "EFS" could have:
>
> >                                 Feature ranking and feature elimination
> with 8
> >         feature_selection.EFS   different algorithms, SFE, SFE-PSO etc.
> <- new
> >         (estimator, *[, ...])   algorithms could be added and
> benchmarked with
> >                                 evolutionary computing, swarm, genetic
> etc.
>
> >         feature_selection.EFSCV Feature elimination with
> cross-validation to
> >         (estimator, *[, ...])   select features
>
> >         Looking forward to an open discussion and if Evolutionary Feature
> >         Selection EFS is something for sklearn project, or maybe a
> separate pip
> >         install package.
>
> >         Kind regards
> >         Dalibor Hrg
> >         https://www.linkedin.com/in/daliborhrg/
>
> >         On Fri, Sep 22, 2023 at 10:50?AM Behrooz Ahadzade <
> b.ahadzade at yahoo.com
> >         > wrote:
>
>
>
> >             Dear Dalibor Hrg,
>
> >             Thank you very much for your attention to the SFE algorithm.
> Thank
> >             you very much for the time you took to guide me and my
> colleagues.
> >             According to your guidance, we will add this algorithm to the
> >             scikit-learn library as soon as possible.
>
> >             Kind regards,
> >             Ahadzadeh.
> >             On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30,
> Dalibor
> >             Hrg <dalibor.hrg at gmail.com> wrote:
>
>
> >             Dear Authors,
>
> >             you have done some amazing work on feature selection here
> published
> >             in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed
> Python
> >             code here without a LICENSE file or any info on
> this: https://
> >             github.com/Ahadzadeh2022/SFE and in the paper some links are
> >             mentioned to download data.
>
> >             I would be interested with you that we:
>
> >             Step 1) make and release a pip package, publish this code in
> JOSS
> >             https://joss.readthedocs.io i.e.
> https://joss.theoj.org/papers/
> >             10.21105/joss.04611 under BSD-3 license and replicate IEEE
> paper
> >             table results. All 8 algorithms could be in potentially one
> class
> >             "EFS" meaning "Evolutionary Feature Selection", selectable
> as 8
> >             options among them SFE. Or something like that.
> >
> >             Step 2) try integrate and work with scikit-learn people, I
> would
> >             recommend it to integrate this under
> https://scikit-learn.org/
> >             stable/modules/classes.html#module-sklearn.feature_selection
> >              similarly to sklearn.feature_selection.RFE. I believe this
> would
> >             be a great contribution to the best open library for ML,
> >             scikit-learn.
>
> >             I am unsure what is the status of datasets and licenses
> therein?.
> >             But, the datasets could be fetched externally from OpenML.org
> >             repository, for example
> https://scikit-learn.org/stable/datasets/
> >             loading_other_datasets.html or CERN Zenodo where "benchmark
> >             datasets" could be expanded. It depends a bit on the dataset
> >             licenses?
>
> >             Overall, I hope this can hugely maximize your published work
> >             visibility but also for others to credit you in papers in a
> more
> >             citable and replicable way. I believe your IEEE paper and
> work
> >             definitely deserve a spot in scikit-learn. There is need for
> some
> >             replicable code on "Evolutionary Methods for Feature
> Selection" and
> >             such Benchmark in life-science datasets, and you have done
> some
> >             great work so far.
>
> >             Let me know what you think.
>
> >             Best regards,
> >             Dalibor Hrg
>
> >             https://www.linkedin.com/in/daliborhrg/
>
>
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> --
>     Gael Varoquaux
>     Research Director, INRIA
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230924/856f5555/attachment-0001.html>

From dalibor.hrg at gmail.com  Sat Sep 23 22:09:43 2023
From: dalibor.hrg at gmail.com (Dalibor Hrg)
Date: Sun, 24 Sep 2023 04:09:43 +0200
Subject: [scikit-learn] Request / Proposal: integrating IEEE paper in
 scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark
 datasets
In-Reply-To: <CAJ=aRPrJuaRWz=BoJaFuj18djnyJCejqSde5n=ap-GT4zmZxbA@mail.gmail.com>
References: <CAJ=aRPrnTrB-amJm8Ft8s4fjuWU62hC+wauj5ETcge2z1UmnvQ@mail.gmail.com>
 <20230924183953.47lkn3nwj544wumm@gaellaptop>
 <CAJ=aRPrJuaRWz=BoJaFuj18djnyJCejqSde5n=ap-GT4zmZxbA@mail.gmail.com>
Message-ID: <CAJ=aRPoM3DE22xpzhGwT1TzhT4nfRoNW+enX=vmUndnnFyxCxg@mail.gmail.com>

p.s. As of efforts, I fully agree as written in FAQ.

I wonder if it could be an EU project for going multiple domain
high-dimensional datasets. It looks as opportunity to discuss in virtual
coffee if anybody interested. I am unsure if scikit-learn community or
groups collaborate mutually for investigating directions or maintaining
through funded projects, but just saying. Perhaps an opportunity along this
discussion.

Cherio
Dalibor


On Sun, Sep 24, 2023 at 3:29?AM Dalibor Hrg <dalibor.hrg at gmail.com> wrote:

> Dear Gael,
>
> Thanks for clarification. Yes, I see, there is a need for more broad use
> of evidence and citations of such methods or approaches. This is somehow
> what I was thinking.
>
> By looking here at sister projects
> https://scikit-learn.org/stable/related_projects.html#related-projects or
> especially package "Boruta"
> https://github.com/scikit-learn-contrib/boruta_py, small question for a
> hint: do you think such a pip package as Boruta could be closest fit by
> implementing it with the cancer benchmark dataset, and replicating the
> paper results?
>
> Certainly, potential is to benchmark and publish on RFE and EFS how they
> go along the benchmark, and demonstrate on diverse high-dimensional
> datasets coming from other domains by other publications. Doing that is a
> long term journey to show the usefulness of the method/algorithm.
>
> Best,
> Dalibor
>
>
> On Sun, Sep 24, 2023, 21:37 Gael Varoquaux <gael.varoquaux at normalesup.org>
> wrote:
>
>> Dear Dalibor,
>>
>> As detailed in the FAQ,
>>
>> https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
>> """
>> We only consider well-established algorithms for inclusion. A rule of
>> thumb is at least 3 years since publication, 200+ citations, and wide use
>> and usefulness.
>> """
>>
>> These days, I would say that the bar is even harder, as we are finding
>> that we prioritize things such as high-quality documentation or better
>> dataframe support to new algorithms.
>>
>> Best,
>>
>> Ga?l
>>
>> On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote:
>> > Dear scikit-learn mailing list
>>
>> > similarly to standing feature_selection.RFE and RFECV, this is a
>> request to
>> > openly discuss the PROPOSAL and requirements of feature_selection.EFS
>> and/or
>> > EFSCV which would stand for "Evolutionary Feature Selection" with
>> starting 8
>> > algorithms or methods to be used with scikit-learn estimators, just as
>> > published in IEEE https://arxiv.org/abs/2303.10182 by the authors of
>> paper.
>> > They agreed to help integrate it (in cc).
>>
>> > PROPOSAL
>> > Implement/integrate https://arxiv.org/abs/2303.10182 paper into
>> scikit-learn:
>>
>> > 1) CODE
>>
>> >   ? implementing feature_selection.EFS and/or EFSCV (a space for
>> evolutionary
>> >     computing community interested in feature selection)
>>
>> > RFE is:
>>
>> > feature_selection.RFE          Feature ranking with recursive feature
>> > (estimator, *[, ...])          elimination.
>>
>> > feature_selection.RFECV        Recursive feature elimination with
>> > (estimator, *[, ...])          cross-validation to select features.
>>
>> >  The "EFS" could be:
>>
>> >                         Feature ranking and feature elimination with 8
>> > feature_selection.EFS   different algorithms, SFE, SFE-PSO etc. <- new
>> > (estimator, *[, ...])   algorithms could be added and benchmarked with
>> >                         evolutionary computing, swarm, genetic etc.
>>
>> > feature_selection.EFSCV Feature elimination with cross-validation to
>> select
>> > (estimator, *[, ...])   features
>>
>>
>> > 2) DATASETS & CANCER BENCHMARK
>>
>> >   ? curating and integrating fetch of cancer_benchmark 40 datasets,
>> directly in
>> >     scikit-learn or externally pullable somehow and maintained (space
>> for
>> >     contributing expanding high-dimensional datasets on cancer
>> topics).
>>
>> > fetch_cancer-benchmark Loads 40 individual cancer related
>> high-dimensional
>> > (*[,, ...])            datasets for benchmarking feature selection
>> methods
>> >                        (classification).
>>
>>
>> > 3) TUTORIAL / WEBSITE
>>
>> >   ? writing tutorial to replicate IEEE paper results
>> with feature_selection.EFS
>> >     and/or EFSCV on cancer_benchmark (40 datasets)
>>
>>
>> > I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of
>> very
>> > interesting novelty in working with high-dimensional datasets as it
>> reports
>> > small subsets of predictive features selected with SVM, KNN across 40
>> datasets.
>> > Replicability under BSD-3 and high quality under scikit-learn could
>> assure
>> > benchmarking novel feature selection algorithms easier - in my very
>> first
>> > opinion. Since this is the very first touch of myself with IEEE paper
>> authors
>> > and the scikit-learn list altogether, we would welcome some help/guide
>> > how integration could work out, and if there is any interest on that
>> line at
>> > all.
>>
>> > Kind regards
>> > Dalibor Hrg
>> > https://www.linkedin.com/in/daliborhrg/
>> >
>>
>> > On Sat, Sep 23, 2023 at 9:08?AM Alexandre Gramfort <
>> alexandre.gramfort at inria.fr
>> > > wrote:
>>
>> >     Dear Dalibor
>>
>> >     you should discuss this on the main scikit-learn mailing list.
>>
>> >     https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> >     Alex
>>
>> >     On Fri, Sep 22, 2023 at 12:19?PM Dalibor Hrg <dalibor.hrg at gmail.com>
>> wrote:
>>
>> >         Dear sklearn feature_selection.RFE Team and IEEE Authors
>> (in-cc),
>>
>> >         This is a request to openly discuss the idea of potential for
>> >         feature_selection.EFS which would stand for "Evolutionary
>> Feature
>> >         Selection" or shortly EFS with starting 8 algorithms as
>> published in
>> >         IEEE https://arxiv.org/abs/2303.10182 by the authors on
>> >         high-dimensional datasets. I have identified this work to be of
>> very
>> >         interesting novelty in working with high-dimensional datasets,
>> >         especially for health fields, and it could mean a lot to the ML
>> >         community and scikit-learn project - in my very first opinion.
>>
>>
>> >         A Jupyter Notebook and scikit-learn tutorial replicating this
>> IEEE
>> >         paper/work as feature_selection.EFS and 8 algorithms in it
>> could be a
>> >         near term goal. And eventually, scikit-learn EFSCV and diverse
>> >         classification algorithms could be benchmarked for "joint
>> paper" in
>> >         JOSS, or a health journal.
>>
>> >         My initial idea (doesn't need to be that way or is open to
>> discussion)
>> >         has some first thought like this:
>> >
>> >         RFE has:
>>
>> >         feature_selection.RFE       Feature ranking with recursive
>> feature
>> >         (estimator, *[, ...])       elimination.
>>
>> >         feature_selection.RFECV     Recursive feature elimination with
>> >         (estimator, *[, ...])       cross-validation to select features.
>>
>> >          The "EFS" could have:
>>
>> >                                 Feature ranking and feature elimination
>> with 8
>> >         feature_selection.EFS   different algorithms, SFE, SFE-PSO etc.
>> <- new
>> >         (estimator, *[, ...])   algorithms could be added and
>> benchmarked with
>> >                                 evolutionary computing, swarm, genetic
>> etc.
>>
>> >         feature_selection.EFSCV Feature elimination with
>> cross-validation to
>> >         (estimator, *[, ...])   select features
>>
>> >         Looking forward to an open discussion and if Evolutionary
>> Feature
>> >         Selection EFS is something for sklearn project, or maybe a
>> separate pip
>> >         install package.
>>
>> >         Kind regards
>> >         Dalibor Hrg
>> >         https://www.linkedin.com/in/daliborhrg/
>>
>> >         On Fri, Sep 22, 2023 at 10:50?AM Behrooz Ahadzade <
>> b.ahadzade at yahoo.com
>> >         > wrote:
>>
>>
>>
>> >             Dear Dalibor Hrg,
>>
>> >             Thank you very much for your attention to the SFE
>> algorithm. Thank
>> >             you very much for the time you took to guide me and my
>> colleagues.
>> >             According to your guidance, we will add this algorithm to
>> the
>> >             scikit-learn library as soon as possible.
>>
>> >             Kind regards,
>> >             Ahadzadeh.
>> >             On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30,
>> Dalibor
>> >             Hrg <dalibor.hrg at gmail.com> wrote:
>>
>>
>> >             Dear Authors,
>>
>> >             you have done some amazing work on feature selection here
>> published
>> >             in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed
>> Python
>> >             code here without a LICENSE file or any info on
>> this: https://
>> >             github.com/Ahadzadeh2022/SFE and in the paper some links
>> are
>> >             mentioned to download data.
>>
>> >             I would be interested with you that we:
>>
>> >             Step 1) make and release a pip package, publish this code
>> in JOSS
>> >             https://joss.readthedocs.io i.e.
>> https://joss.theoj.org/papers/
>> >             10.21105/joss.04611 under BSD-3 license and replicate IEEE
>> paper
>> >             table results. All 8 algorithms could be in potentially one
>> class
>> >             "EFS" meaning "Evolutionary Feature Selection", selectable
>> as 8
>> >             options among them SFE. Or something like that.
>> >
>> >             Step 2) try integrate and work with scikit-learn people, I
>> would
>> >             recommend it to integrate this under
>> https://scikit-learn.org/
>> >             stable/modules/classes.html#module-sklearn.feature_selection
>> >              similarly to sklearn.feature_selection.RFE. I believe this
>> would
>> >             be a great contribution to the best open library for ML,
>> >             scikit-learn.
>>
>> >             I am unsure what is the status of datasets and licenses
>> therein?.
>> >             But, the datasets could be fetched externally from
>> OpenML.org
>> >             repository, for example
>> https://scikit-learn.org/stable/datasets/
>> >             loading_other_datasets.html or CERN Zenodo where "benchmark
>> >             datasets" could be expanded. It depends a bit on the dataset
>> >             licenses?
>>
>> >             Overall, I hope this can hugely maximize your published work
>> >             visibility but also for others to credit you in papers in a
>> more
>> >             citable and replicable way. I believe your IEEE paper and
>> work
>> >             definitely deserve a spot in scikit-learn. There is need
>> for some
>> >             replicable code on "Evolutionary Methods for Feature
>> Selection" and
>> >             such Benchmark in life-science datasets, and you have done
>> some
>> >             great work so far.
>>
>> >             Let me know what you think.
>>
>> >             Best regards,
>> >             Dalibor Hrg
>>
>> >             https://www.linkedin.com/in/daliborhrg/
>>
>>
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> --
>>     Gael Varoquaux
>>     Research Director, INRIA
>>     http://gael-varoquaux.info
>> http://twitter.com/GaelVaroquaux
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230924/07527f31/attachment-0001.html>

From sepand.haghighi at yahoo.com  Thu Sep 28 09:21:14 2023
From: sepand.haghighi at yahoo.com (Sepand Haghighi)
Date: Thu, 28 Sep 2023 13:21:14 +0000 (UTC)
Subject: [scikit-learn] Introducing PyMilo: A New Way to Transport
 Pre-trained ML Models
References: <387873285.4381469.1695907274639.ref@mail.yahoo.com>
Message-ID: <387873285.4381469.1695907274639@mail.yahoo.com>

Dear all,
We are thrilled to introduce PyMilo, an open-source Python package that can revolutionize the way you transport pre-trained machine-learning models. PyMilo offers an efficient, secure, and transparent method that aims to eliminate the risks associated with binary or pickle formats.
Why PyMilo?
The motivation behind developing this package is simple but significant: to provide a safer and more reliable way to share machine learning models. As we embark on this journey, we acknowledge that PyMilo is still in its early stages of development. Currently, it supports only a limited number of machine learning models provided by Scikit-learn.
Your Feedback Matters
We firmly believe in the power of community collaboration. This is why we're reaching out to you, the Scikit-learn users, to ask for your support in utilizing PyMilo and providing us with your invaluable feedback. Your insights can help us enhance the package's interface and prioritize future developments.
How You Can Contribute
- Try PyMilo with your Scikit-learn models and let us know about your experience.- Share your thoughts on improving PyMilo's functionality and usability.- Report any issues or bugs you encounter.
Your cooperation would be precious to us as we work towards making PyMilo a robust and indispensable tool for the machine learning community.
Get Started
To start using PyMilo, you can find detailed documentation and installation instructions on our GitHub repository:
GitHub - openscilab/pymilo: PyMilo: Python for ML I/O

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
GitHub - openscilab/pymilo: PyMilo: Python for ML I/O

PyMilo: Python for ML I/O. Contribute to openscilab/pymilo development by creating an account on GitHub.
 |

 |

 |


Join us in shaping the future of model transportation with Pymilo!
Thank you for your time and support. We look forward to your active participation in this exciting endeavor.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230928/ac04a640/attachment.html>

From adrin.jalali at gmail.com  Thu Sep 28 09:43:51 2023
From: adrin.jalali at gmail.com (Adrin)
Date: Thu, 28 Sep 2023 15:43:51 +0200
Subject: [scikit-learn] Introducing PyMilo: A New Way to Transport
 Pre-trained ML Models
In-Reply-To: <387873285.4381469.1695907274639@mail.yahoo.com>
References: <387873285.4381469.1695907274639.ref@mail.yahoo.com>
 <387873285.4381469.1695907274639@mail.yahoo.com>
Message-ID: <CAEOrW49rjELYkYot9rw_UoAhLWPxRf1=J4SDdn9jd9E6uNEGMw@mail.gmail.com>

Hi,

This seems somewhat similar but less complete than `skops.io`. You might
wanna have a look there: https://github.com/skops-dev/skops/

On Thu, Sep 28, 2023 at 3:22?PM Sepand Haghighi via scikit-learn <
scikit-learn at python.org> wrote:

> Dear all,
>
> We are thrilled to introduce PyMilo, an open-source Python package that
> can revolutionize the way you transport pre-trained machine-learning
> models. PyMilo offers an efficient, secure, and transparent method that
> aims to eliminate the risks associated with binary or pickle formats.
>
> *Why PyMilo?*
>
> The motivation behind developing this package is simple but significant:
> to provide a safer and more reliable way to share machine learning models.
> As we embark on this journey, we acknowledge that PyMilo is still in its
> early stages of development. Currently, it supports only a limited number
> of machine learning models provided by Scikit-learn.
>
> *Your Feedback Matters*
>
> We firmly believe in the power of community collaboration. This is why
> we're reaching out to you, the Scikit-learn users, to ask for your support
> in utilizing PyMilo and providing us with your invaluable feedback. Your
> insights can help us enhance the package's interface and prioritize future
> developments.
>
> *How You Can Contribute*
>
> - Try PyMilo with your *Scikit-learn* models and let us know about your
> experience.
> - Share your thoughts on improving PyMilo's functionality and usability.
> - Report any issues or bugs you encounter.
>
> Your cooperation would be precious to us as we work towards making PyMilo
> a robust and indispensable tool for the machine learning community.
>
> *Get Started*
>
> To start using PyMilo, you can find detailed documentation and
> installation instructions on our GitHub repository:
>
> GitHub - openscilab/pymilo: PyMilo: Python for ML I/O
> <https://github.com/openscilab/pymilo>
>
> GitHub - openscilab/pymilo: PyMilo: Python for ML I/O
>
> PyMilo: Python for ML I/O. Contribute to openscilab/pymilo development by
> creating an account on GitHub.
> <https://github.com/openscilab/pymilo>
>
>
>
> Join us in shaping the future of model transportation with Pymilo!
>
> Thank you for your time and support. We look forward to your active
> participation in this exciting endeavor.
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20230928/2a7b6c8c/attachment-0001.html>