chi square test in sklearn printing NAN values for most of the columns

Rahul Gupta rahulgupta100689 at gmail.com
Mon Apr 27 03:16:04 EDT 2020


Hi i am trying to use chi-square Test to select most important columns among 5501 columns. But for most of the columns i am getting NAN value as a Chi test value

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import chi2
cols =[]
cols.append(int(0))
#for i in range(1, 5502):
cols.append(int(10))

df = pd.read_csv("D:\PHD\obranking\\demo.csv", usecols=cols)
df.apply(LabelEncoder().fit_transform)
X = df.drop(labels='label', axis=1)
Y = df['label']
chi_scores = chi2(X, Y)
print(chi_scores)
in this code i printed chi value for 10th column but for most of the columns it is behaving like below "C:\Users\Rahul Gupta\PycharmProjects\CSVLearn\venv\Scripts\python.exe" "C:/Users/Rahul Gupta/PycharmProjects/CSVLearn/ChiSq_learn.py" (array([nan]), array([nan]))

Process finished with exit code 0


More information about the Python-list mailing list