[Tutor] stats.linregress Problem
Mark Lawrence
breamoreboy at gmail.com
Fri Nov 20 13:02:47 EST 2020
On 20/11/2020 17:08, Stephen P. Molnar wrote:
> I have been following the steps in
> https://365datascience.com/linear-regression/ . The libraries are up to
> date.
>
> The code in the reference is (attached):
>
> import pandas as pd
> import matplotlib.pyplot as plt
> import statsmodels.api as sm
> import seaborn as sns
> from scipy import stats
> sns.set()
>
> df = pd.read_csv('Data.csv')
> print(df)
>
> y = df['GPA']
> x1 = df['SAT']
> sns.set_style('whitegrid')
> plt.figure(1)
> plt.scatter(x1,y, s = 10)
> plt.xlabel('SAT', fontsize=10)
> plt.ylabel('GPA', fontsize=10)
>
> x = sm.add_constant(x1)
> results = sm.OLS(y,x).fit()
> results.summary()
> print(results.summary())
>
> slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
> print("slope: %f intercept: %f" % (slope, intercept))
> print("R-squared: %f" % r_value**2)
>
> with the exception of the last lines, which I have added. It runs to the
> end and gives me the values I want. The modified script and the data
> file are also attached.
>
> However, when I modify the script to use the data that I want I get
> errors for the last three lines"
>
> import pandas as pd
> import matplotlib.pyplot as plt
> import statsmodels.api as sm
> import seaborn as sns
> from scipy import stats
> sns.set()
>
>
> df = pd.read_csv('AllData31e.csv')
> df = df.dropna(axis=1)
>
>
> x1 = df.iloc[:,2].values.reshape(-1,1)
> y = df.iloc[:,1].values.reshape(-1,1)
>
> print(df.describe())
>
>
> sns.set_style('whitegrid')
> plt.figure(1)
> plt.scatter(x1,y, s = 10)
> plt.xlabel('log(IC50)', fontsize=10)
> plt.ylabel('Activity (kcal.mole)', fontsize=10)
>
> x = sm.add_constant(x1)
> results = sm.OLS(y,x).fit()
> results.summary()
> print(results.summary())
>
> slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
> print("slope: %f intercept: %f" % (slope, intercept))
> print("R-squared: %f" % r_value**2)
>
> Traceback (most recent call last):
>
> File "/home/comp/Apps/PythonDevelopment/LinReg_2.py", line 39, in
> <module>
> slope, intercept, r_value, p_value, std_err = stats.linregress(x1, y)
>
> File
> "/home/comp/Apps/Spyder-4.2.0/Spyder-4.2.0/lib/python3.7/site-packages/scipy/stats/_stats_mstats_common.py",
> line 116, in linregress
> ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
>
> ValueError: too many values to unpack (expected 4)
>
> Google in this instance has not been my friend. Pointers in the
> direction of a solution will be appreciated.
>
> Thanks in advance.
>
I'd put print calls into the original and modified code to ensure that
x1 and y have the same dimensions for the call into stats.linregress.
If yes I haven't a clue, sorry. If no then modify the code that gets
the data from your file, simples :-)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
More information about the Tutor
mailing list