[Tutor] stats.linregress Problem

Mark Lawrence breamoreboy at gmail.com
Fri Nov 20 13:02:47 EST 2020


On 20/11/2020 17:08, Stephen P. Molnar wrote:
> I have been following the steps in 
> https://365datascience.com/linear-regression/ . The libraries are up to 
> date.
> 
> The code in the reference is (attached):
> 
> import pandas as pd
> import matplotlib.pyplot as plt
> import statsmodels.api as sm
> import seaborn as sns
> from scipy import stats
> sns.set()
> 
> df = pd.read_csv('Data.csv')
> print(df)
> 
> y = df['GPA']
> x1 = df['SAT']
> sns.set_style('whitegrid')
> plt.figure(1)
> plt.scatter(x1,y, s = 10)
> plt.xlabel('SAT', fontsize=10)
> plt.ylabel('GPA', fontsize=10)
> 
> x = sm.add_constant(x1)
> results = sm.OLS(y,x).fit()
> results.summary()
> print(results.summary())
> 
> slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
> print("slope: %f    intercept: %f" % (slope, intercept))
> print("R-squared: %f" % r_value**2)
> 
> with the exception of the last lines, which I have added. It runs to the 
> end and gives me the values I want. The modified script and the data 
> file are also attached.
> 
> However, when I modify the script to use the data that I want I get 
> errors for the last three lines"
> 
> import pandas as pd
> import matplotlib.pyplot as plt
> import statsmodels.api as sm
> import seaborn as sns
> from scipy import stats
> sns.set()
> 
> 
> df = pd.read_csv('AllData31e.csv')
> df = df.dropna(axis=1)
> 
> 
> x1 = df.iloc[:,2].values.reshape(-1,1)
> y = df.iloc[:,1].values.reshape(-1,1)
> 
> print(df.describe())
> 
> 
> sns.set_style('whitegrid')
> plt.figure(1)
> plt.scatter(x1,y, s = 10)
> plt.xlabel('log(IC50)', fontsize=10)
> plt.ylabel('Activity (kcal.mole)', fontsize=10)
> 
> x = sm.add_constant(x1)
> results = sm.OLS(y,x).fit()
> results.summary()
> print(results.summary())
> 
> slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
> print("slope: %f    intercept: %f" % (slope, intercept))
> print("R-squared: %f" % r_value**2)
> 
> Traceback (most recent call last):
> 
>    File "/home/comp/Apps/PythonDevelopment/LinReg_2.py", line 39, in 
> <module>
>      slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
> 
>    File 
> "/home/comp/Apps/Spyder-4.2.0/Spyder-4.2.0/lib/python3.7/site-packages/scipy/stats/_stats_mstats_common.py", 
> line 116, in linregress
>      ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
> 
> ValueError: too many values to unpack (expected 4)
> 
> Google in this instance has not been my friend. Pointers in the 
> direction of a solution will be appreciated.
> 
> Thanks in advance.
> 

I'd put print calls into the original and modified code to ensure that 
x1 and y have the same dimensions for the call into stats.linregress. 
If yes I haven't a clue, sorry.  If no then modify the code that gets 
the data from your file, simples :-)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence



More information about the Tutor mailing list