[scikit-learn] Maximum Mutual Information value for continuous variables
Thomas Evangelidis
tevang3 at gmail.com
Wed Nov 27 11:58:45 EST 2019
Greetings,
I am thinking of alternative ways of removing the invariant scalar features
from my feature vectors before training MLPs. So far I tried removing
columns with 0-variance and columns with Pearson's R=1.0 or R=-1.0. If I
remove columns with |R|<1.0 the performance drops. However, R measures the
linear correlation. Now I am thinking to try removing columns with high
Mutual Information, but first I need to normalize it. I found in the
documentation under "Univariate Feature Selection" the function
"mutual_info_regression".
https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection
I used this function to measure the correlation between columns (features)
but sometimes returns values >1.0. On the other hand, there is also this
function
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_mutual_info_score.html#sklearn.metrics.adjusted_mutual_info_score
which is upper limited to 1.0 but it is for categorical data (clusters). So
my question is, is there a way to computer normalized Mutual Information
for continuous variables, too?
Thanks in advance for any advice.
Thomas
--
======================================================================
Dr. Thomas Evangelidis
Research Scientist
IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague,
Czech Republic
&
CEITEC - Central European Institute of Technology
<https://www.ceitec.eu/>, Brno,
Czech Republic
email: tevang3 at gmail.com, Twitter: tevangelidis
<https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
<https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20191127/5ec49bd2/attachment.html>
More information about the scikit-learn
mailing list