[Tutor] stats.linregress Problem

Stephen P. Molnar s.molnar at sbcglobal.net
Fri Nov 20 12:08:32 EST 2020


I have been following the steps in 
https://365datascience.com/linear-regression/ . The libraries are up to 
date.

The code in the reference is (attached):

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
from scipy import stats
sns.set()

df = pd.read_csv('Data.csv')
print(df)

y = df['GPA']
x1 = df['SAT']
sns.set_style('whitegrid')
plt.figure(1)
plt.scatter(x1,y, s = 10)
plt.xlabel('SAT', fontsize=10)
plt.ylabel('GPA', fontsize=10)

x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
print(results.summary())

slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
print("slope: %f    intercept: %f" % (slope, intercept))
print("R-squared: %f" % r_value**2)

with the exception of the last lines, which I have added. It runs to the 
end and gives me the values I want. The modified script and the data 
file are also attached.

However, when I modify the script to use the data that I want I get 
errors for the last three lines"

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
from scipy import stats
sns.set()


df = pd.read_csv('AllData31e.csv')
df = df.dropna(axis=1)


x1 = df.iloc[:,2].values.reshape(-1,1)
y = df.iloc[:,1].values.reshape(-1,1)

print(df.describe())


sns.set_style('whitegrid')
plt.figure(1)
plt.scatter(x1,y, s = 10)
plt.xlabel('log(IC50)', fontsize=10)
plt.ylabel('Activity (kcal.mole)', fontsize=10)

x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
print(results.summary())

slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
print("slope: %f    intercept: %f" % (slope, intercept))
print("R-squared: %f" % r_value**2)

Traceback (most recent call last):

   File "/home/comp/Apps/PythonDevelopment/LinReg_2.py", line 39, in <module>
     slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)

   File "/home/comp/Apps/Spyder-4.2.0/Spyder-4.2.0/lib/python3.7/site-packages/scipy/stats/_stats_mstats_common.py", line 116, in linregress
     ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat

ValueError: too many values to unpack (expected 4)

Google in this instance has not been my friend. Pointers in the direction of a solution will be appreciated.

Thanks in advance.

-- 
Stephen P. Molnar, Ph.D.
www.molecular-modeling.net
614.312.7528 (c)
Skype:  smolnar1



More information about the Tutor mailing list