Problem in defining multidimensional array matrix and regression

Sun Nov 19 12:55:15 EST 2017

Hello Peter,

Many thanks for your suggestion. 
Now I am using Pandas &
I already did that but now I need to make a multi-dimensional array for reading all variables (5 in this case) at one x-axis, so I can perform multiple regression analysis. 

I am not getting how to bring all variables at one axis (e.g. at x-axis)?

Thanks
Vishal

On Sunday, 19 November 2017 22:32:06 UTC+5:30, Peter Otten  wrote:
> shalu.ashu50 at gmail.com wrote:
> 
> > Hi, All,
> > 
> > I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and
> > others are predictors (at x). I want to do multiple regression and create
> > a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I
> > want to read rainfall as a separate variable and others in separate
> > columns, so I can apply the algo. However, I am not able to make a proper
> > matrix for them.
> > 
> > Here are my data and codes?
> > Please suggest me for the same.
> > I am new to Python.
> > 
> > RF	P1	P2	P3	P4	P5
> > 120.235	0.234	-0.012	0.145	21.023	0.233
> > 200.14	0.512	-0.021	0.214	22.21	0.332
> > 185.362	0.147	-0.32	0.136	24.65	0.423
> > 201.895	0.002	-0.12	0.217	30.25	0.325
> > 165.235	0.256	0.001	0.22	31.245	0.552
> > 198.236	0.012	-0.362	0.215	32.25	0.333
> > 350.263	0.98	-0.85	0.321	38.412	0.411
> > 145.25	0.046	-0.36	0.147	39.256	0.872
> > 198.654	0.65	-0.45	0.224	40.235	0.652
> > 245.214	0.47	-0.325	0.311	26.356	0.632
> > 214.02	0.18	-0.012	0.242	22.01	0.745
> > 147.256	0.652	-0.785	0.311	18.256	0.924
> > 
> > import numpy as np
> > import statsmodels as sm
> > import statsmodels.formula as smf
> > import csv
> > 
> > with open("pcp1.csv", "r") as csvfile:
> >     readCSV=csv.reader(csvfile)
> >     
> >     rainfall = []
> >     csvFileList = []
> >     
> >     for row in readCSV:
> >         Rain = row[0]
> >         rainfall.append(Rain)
> > 
> >         if len (row) !=0:
> >             csvFileList = csvFileList + [row]
> >         
> > print(csvFileList)
> > print(rainfall)
> 
> You are not the first to read tabular data from a file; therefore numpy (and 
> pandas) offer highlevel function to do just that. Once you have the complete 
> table extracting a specific column is easy. For instance:
> 
> $ cat rainfall.txt 
> RF      P1      P2      P3      P4      P5
> 120.235 0.234   -0.012  0.145   21.023  0.233
> 200.14  0.512   -0.021  0.214   22.21   0.332
> 185.362 0.147   -0.32   0.136   24.65   0.423
> 201.895 0.002   -0.12   0.217   30.25   0.325
> 165.235 0.256   0.001   0.22    31.245  0.552
> 198.236 0.012   -0.362  0.215   32.25   0.333
> 350.263 0.98    -0.85   0.321   38.412  0.411
> 145.25  0.046   -0.36   0.147   39.256  0.872
> 198.654 0.65    -0.45   0.224   40.235  0.652
> 245.214 0.47    -0.325  0.311   26.356  0.632
> 214.02  0.18    -0.012  0.242   22.01   0.745
> 147.256 0.652   -0.785  0.311   18.256  0.924
> $ python3
> Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
> [GCC 4.8.4] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy
> >>> rf = numpy.genfromtxt("rainfall.txt", names=True)
> >>> rf["RF"]
> array([ 120.235,  200.14 ,  185.362,  201.895,  165.235,  198.236,
>         350.263,  145.25 ,  198.654,  245.214,  214.02 ,  147.256])
> >>> rf["P3"]
> array([ 0.145,  0.214,  0.136,  0.217,  0.22 ,  0.215,  0.321,  0.147,
>         0.224,  0.311,  0.242,  0.311])