[Tutor] stats.linregress Problem
Stephen P. Molnar
s.molnar at sbcglobal.net
Fri Nov 20 12:08:32 EST 2020
I have been following the steps in
https://365datascience.com/linear-regression/ . The libraries are up to
date.
The code in the reference is (attached):
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
from scipy import stats
sns.set()
df = pd.read_csv('Data.csv')
print(df)
y = df['GPA']
x1 = df['SAT']
sns.set_style('whitegrid')
plt.figure(1)
plt.scatter(x1,y, s = 10)
plt.xlabel('SAT', fontsize=10)
plt.ylabel('GPA', fontsize=10)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
print(results.summary())
slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
print("slope: %f intercept: %f" % (slope, intercept))
print("R-squared: %f" % r_value**2)
with the exception of the last lines, which I have added. It runs to the
end and gives me the values I want. The modified script and the data
file are also attached.
However, when I modify the script to use the data that I want I get
errors for the last three lines"
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
from scipy import stats
sns.set()
df = pd.read_csv('AllData31e.csv')
df = df.dropna(axis=1)
x1 = df.iloc[:,2].values.reshape(-1,1)
y = df.iloc[:,1].values.reshape(-1,1)
print(df.describe())
sns.set_style('whitegrid')
plt.figure(1)
plt.scatter(x1,y, s = 10)
plt.xlabel('log(IC50)', fontsize=10)
plt.ylabel('Activity (kcal.mole)', fontsize=10)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
print(results.summary())
slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
print("slope: %f intercept: %f" % (slope, intercept))
print("R-squared: %f" % r_value**2)
Traceback (most recent call last):
File "/home/comp/Apps/PythonDevelopment/LinReg_2.py", line 39, in <module>
slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
File "/home/comp/Apps/Spyder-4.2.0/Spyder-4.2.0/lib/python3.7/site-packages/scipy/stats/_stats_mstats_common.py", line 116, in linregress
ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
ValueError: too many values to unpack (expected 4)
Google in this instance has not been my friend. Pointers in the direction of a solution will be appreciated.
Thanks in advance.
--
Stephen P. Molnar, Ph.D.
www.molecular-modeling.net
614.312.7528 (c)
Skype: smolnar1
More information about the Tutor
mailing list