From xiongyao at pku.edu.cn Wed May 3 08:01:49 2017 From: xiongyao at pku.edu.cn (=?UTF-8?B?54aK55G2?=) Date: Wed, 3 May 2017 20:01:49 +0800 (GMT+08:00) Subject: [scikit-learn] question about scikit-learn Message-ID: Dear professor, scikit-learn is really good for me to do some work using machine learning method. Here, I have two questions: 1?To do 5 fold cross-validation, when I use StratifiedKFold?I could get stratified folds that each fold contains approximately the same percentage of samples of each target class as the complete set. And, when I use GroupKFold, it ensures that the same group is not represented in both testing and training sets. I want to know whether there is a method to combine these two methods together? 2) When I use GridSearchCV to do parameter search, I use scoring="accuracy" as scoring function to choose the best parameters. And I find that I can only get the accuracy score from the 5 fold cross-validation. What can I do if I want to get other scores such as sensitivity, specificity, MCC at the same time? It means that I want to use accuracy to choose the best parameters and I want to get the scores of many scoring parameters at the same time when I do 5 fold cross-validation. Thank you. ?? ?????????? ???????????? XIONG Yao G301, School of Chemical Biology & Biotechnology Peking University Shenzhen Graduate School Shenzhen 518055, Guangdong, P.R. China E-mail: xiongyao at pku.edu.cn or xiongy20121126 at foxmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiongyao at pku.edu.cn Wed May 3 08:05:53 2017 From: xiongyao at pku.edu.cn (=?UTF-8?B?54aK55G2?=) Date: Wed, 3 May 2017 20:05:53 +0800 (GMT+08:00) Subject: [scikit-learn] question about scikit-learn Message-ID: <2420d99f.f2fd.15bce33ce50.Coremail.xiongyao@pku.edu.cn> Dear professor, scikit-learn is really good for me to do some work using machine learning method. Here, I have two questions: 1?To do 5 fold cross-validation, when I use StratifiedKFold?I could get stratified folds that each fold contains approximately the same percentage of samples of each target class as the complete set. And, when I use GroupKFold, it ensures that the same group is not represented in both testing and training sets. I want to know whether there is a method to combine these two methods together? 2) When I use GridSearchCV to do parameter search, I use scoring="accuracy" as scoring function to choose the best parameters. And I find that I can only get the accuracy score from the 5 fold cross-validation. What can I do if I want to get other scores such as sensitivity, specificity, MCC at the same time? It means that I want to use accuracy to choose the best parameters and I want to get the scores of many scoring parameters at the same time when I do 5 fold cross-validation. Thank you. ?? ?????????? ???????????? XIONG Yao G301, School of Chemical Biology & Biotechnology Peking University Shenzhen Graduate School Shenzhen 518055, Guangdong, P.R. China E-mail: xiongyao at pku.edu.cn or xiongy20121126 at foxmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From dschlessinger at live.com Thu May 4 07:52:00 2017 From: dschlessinger at live.com (David Schlessinger) Date: Thu, 4 May 2017 11:52:00 +0000 Subject: [scikit-learn] How can I tell if I am getting the loading values from a PCA analysis using scikit-learn? Message-ID: Firstly, I apologize in advance if the following questions I have are very basic. I am very new to coding in general, as well as Principal Component Analysis and scikit-learn. I am trying to finish a project for an internship and hit a wall, and am desperately trying to seek help to solve this before my deadline. I have a set of RNA sequences that are evaluated based on several parameters. For the sake of simplicity, let's say there are three parameters: the GC content in the RNA sequence's ribosome binding site (RBS), the estimated stability of the RNA sequences' secondary structures (MFE), and the ensemble defect of the RNA (i.e. the number of nucleotides that do not conform to a prescribed secondary structure). A series of functions calculated the values of these RNA sequences for each parameter, and the lower their calculated value is, the more indicative it is that the RNA sequence in question is more optimal for our experimental purposes. What I am now trying to do is build a composite score from the values in these parameters for each RNA sequence using PCA. Rather than use PCA for dimension reduction, I am going to use it to determine the loading values for each value to their respective component. I have been using the following code to calculate the loadings and subsequently utilize them for the creation of a composite score, which is the sum of the original values multiplied by their corresponding loading values (i.e. the loading values are used as weights on the original values). import pandas as pd from sklearn.decomposition import PCA from sklearn.preprocessing import scale green= path.join(output_folder, "SequenceScoring.csv") df = pd.read_csv(green) X= df.values X = scale(X) pca = PCA(n_components=3) pca.fit(X) X1=pca.fit_transform(X) df1 = pd.DataFrame(data= X1, index= range(len(df['MFE of Sequence Complex'])), columns= ['Loading MFE of Sequence Complex', 'Loading Percentage of Ensemble Defect of RBS of Trigger-Switch Complex', 'Loading GC Content of RBS Region in Switch']) Although this seems to be the correct procedure, I am not certain if I properly understand the output of X1=pca.fit_transform(X). One source I initially used ostensibly cleared the matter (source: Sklearn PCA is pca.components_ the loadings?), but upon closer inspection, I realized I wasn't sure if I was getting the correct values, which was described as "the result of the projection for each sample into the vector space spanned by the components". Furthermore, loadings can also be defined as being "sums of squares within each component are the eigenvalues (components' variances)" (source: https://stats.stackexchange.com/questions/92499/how-to-interpret-pca-loadings). I checked the Eigenvalues of my parameters using: X_std = StandardScaler().fit_transform(X) cov_mat = np.cov(X_std.T) eig_vals, eig_vecs = np.linalg.eig(cov_mat) print('\nEigenvalues \n%s' %eig_vals) And then I squared and summed the loading values in each column produced by 'X1=pca.fit_transform(X)`, and found that they did not match the Eigenvalues for the respective parameters at all. It is worth noting that I understood the term "loading values" as the distance between a certain value and an associated component (so that values that influenced the slope and variance captured by the component more strongly had higher loading values). Am I fundamentally misunderstanding the concept of loading values? Or am I not using the right function from scikit-learn? I have tried to look through the source code for scikit-learn's pca.fit_transform, but I don't have the level of mathematical or coding experience required to understand it. Thanks so much, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.m.wink at gmail.com Thu May 4 11:52:27 2017 From: a.m.wink at gmail.com (Alle Meije Wink) Date: Thu, 4 May 2017 17:52:27 +0200 Subject: [scikit-learn] using a mask for brain images Message-ID: I have a script to classify MRI perfusion maps from healthy subjects and patients. For the file IO and the classifier I have started with the example code in Abraham et al 2014 [https://arxiv.org/pdf/1412.3919.pdf]. I use the same classifier as in the paper to produce a back-projected map of classification weights, which I then want to 'unmask' like in the paper: coef=clf.coef_ coef=featureselection.inverse_transform(coef) and map_name='weights_check.nii.gz' wmap=np.zeros(mask.shape, dtype=X.dtype) wmap[mask]=coef img=nb.Nifti1Image(wmap,np.eye(4)) img.to_filename(map_name) But the line "wmap[mask]=coef" throws an error "ValueError: boolean index array should have 1 dimension". I tried the example code from the paper and that works. Is the 'coef' array of back-projected SVM weights in some way different than the masked input image? Or am I doing something else wrong? The error suggests that the mask array is the problem. The complete script is attached. Many thanks for your help! Alle Meije -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mr_class.py Type: text/x-python Size: 3442 bytes Desc: not available URL: From t3kcit at gmail.com Thu May 4 20:15:47 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 4 May 2017 20:15:47 -0400 Subject: [scikit-learn] question about scikit-learn In-Reply-To: <2420d99f.f2fd.15bce33ce50.Coremail.xiongyao@pku.edu.cn> References: <2420d99f.f2fd.15bce33ce50.Coremail.xiongyao@pku.edu.cn> Message-ID: <36f93ba2-7ead-75d2-7c15-4eaac5b878d9@gmail.com> On 05/03/2017 08:05 AM, ?? wrote: > Dear professor, > > scikit-learn is really good for me to do some work using machine > learning method. Here, I have two questions: > > 1?To do 5 fold cross-validation, when I use StratifiedKFold?I could > get stratified folds that each fold contains approximately the same > percentage of samples > > of each target class as the complete set. And, when I use GroupKFold, > it ensures that the same group is not represented in both testing and > training sets. > > I want to know whether there is a method to combine these two methods > together? Not implemented (yet). I think because it was a bit unclear what's the best thing to do. > > 2) When I use GridSearchCV to do parameter search, I use > scoring="accuracy" as scoring function to choose the best parameters. > And I find that I can only get > > the accuracy score from the 5 fold cross-validation. What can I do if > I want to get other scores such as sensitivity, specificity, MCC *at > the same time*? It > > means that I want to use accuracy to choose the best parameters and I > want to get the scores of many scoring parameters at the same time > when I do 5 fold > https://github.com/scikit-learn/scikit-learn/pull/7388 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu May 4 20:28:00 2017 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 4 May 2017 20:28:00 -0400 Subject: [scikit-learn] Contribution to sklearn: Cross validation of time series In-Reply-To: References: Message-ID: <041d66fc-ca96-31ab-deb5-63058be6f98c@gmail.com> Not sure if my internet is bad or the pictures you attached are broken. So you want min_train_size < test_size, or what's the main use case? Given min_train_size and test_size doesn't entirely define the splits, though. How much do you increase the training set in each iteration? In TimeSeriesSplit min_train_size = test_size = increase. Indeed you could choose all three of them separately. Either way, you either end up not using all of your data or your last increase is smaller than all your other increases - or you parametrize using number of iterations, so that you can make it line up with the dataset size. On 04/28/2017 01:31 PM, andres lago wrote: > > Hi Andy, > > sorry, I pushed an unwanted 'send' in the previous message. Thanks > for your quick reply. I'll try to be more precise with the CV I'm > proposing. Comparing to the actual implementation (TimeSeriesSplit), > these would be the new parameters: > > 1-CV mode: Rolling window Or Variable length window: > > > Rolling window: keeps the same size of CV-training set for all > folds, shifting forward at each iteration of CV. > > > Variable length window: increments the size of CV-training set > at each fold iteration (actual implementation in TimeSeriesSplit). > > > 2-minimum size of CV-training set: Initial size of CV-training set. > It's the minimum number of observations required to do the first > predictions. > > > 3- size of CV-test set: Size of the CV-test set. It's constant for > all folds. Should have the size of the prediction horizon. > > > The number of folds is not required anymore, it's automatically > calculated from fields 2 & 3. > > > The idea behind this contribution is to cover some common use cases > around the CV that today is impossible with TimeSeriesSplit: > > -Your data doesn't show seasonality, your dataset is huge then > you'd like to perform CV with a rolling window to accelerate the CV > > -The client asked for a prediction horizon of 7 days, you'd like > to perform the tests in CV with this horizon > > -The data has a strong seasonality, you want to fit at least 1 > month of observations before the first prediction in CV > > > Please find enclosed some graphics to ease understanding the proposal. > > > Regards, > > Andr?s > > > > > > > > ------------------------------------------------------------------------ > *De:* scikit-learn > en nombre de > Andreas Mueller > *Enviado:* viernes, 28 de abril de 2017 05:48 p. m. > *Para:* Scikit-learn user and developer mailing list > *Asunto:* Re: [scikit-learn] Contribution to sklearn: Cross validation > of time series > Hey Andres. > I think there might be a PR for that. > Can you explain the minimum size of the training set? How is that used? > I thought the other main option would be "rolling window" cross validation > to use a fixed length cv training set. > > So the two options to me were rolling window and what we're doing > right now. > Can you elaborate on the other use cases, like minimum size of the > training set > and why you would want the other options with a variable length > training set? > > Thanks, > Andy > > On 04/27/2017 09:44 AM, andres lago wrote: >> >> Hello, >> >> I'd like to contribute with a new functionality in sklearn. It's >> the cross validation of time series. It's an evolution of the >> current functionality, implemented by TimeSeriesSplit. >> >> >> TimeSeriesSplit only allows the user to set the number of folds. In >> real life, when performing the cross validation of time series, other >> parameters are required, for instance: >> >> -minimum size of CV-training set >> >> -size of CV-test set >> >> -fixed or variable length of CV-training set. >> >> >> The functionality is inspired by the R library 'caret'. >> >> >> If you agree, I can share my code. I developed it for a project >> with the french rail company SNCF. It's in production now. >> >> >> Regards, >> >> Andres >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 104405 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 48958 bytes Desc: not available URL: From gael.varoquaux at normalesup.org Fri May 5 01:59:26 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 5 May 2017 07:59:26 +0200 Subject: [scikit-learn] using a mask for brain images In-Reply-To: References: Message-ID: <20170505055926.GD2921489@phare.normalesup.org> Hi Alle, I think that what has changed between 2014 and today is that the coefficients coef are now a 2D array (number of hyperplanes x number of features). In your case, the first direction is of length one, so you could just do: coef = clf.coef_[0] and your script should work. The code of the Abraham paper cannot be updated, because papers are not living objects. However, we are maintaining a package that encodes all these patterns in higher-level construct: http://nilearn.github.io/ It might be a good idea to use this package, as it is maintained and has quality assurance. Best, Ga?l On Thu, May 04, 2017 at 05:52:27PM +0200, Alle Meije Wink wrote: > I have a script to classify MRI perfusion maps from healthy subjects and > patients. For the file IO and the classifier I have started with the example > code in Abraham et al 2014 [https://arxiv.org/pdf/1412.3919.pdf]. > I use the same classifier as in the paper to produce a back-projected map of > classification weights, which I then want to 'unmask' like in the paper: > ??? coef=clf.coef_ > ??? coef=featureselection.inverse_transform(coef)?????????? > and > ??? map_name='weights_check.nii.gz' > ??? wmap=np.zeros(mask.shape, dtype=X.dtype) > ??? wmap[mask]=coef > ??? img=nb.Nifti1Image(wmap,np.eye(4)) > ??? img.to_filename(map_name)??? > But the line "wmap[mask]=coef" throws an error "ValueError: boolean index array > should have 1 dimension". I tried the example code from the paper and that > works. > Is the 'coef' array of back-projected SVM weights in some way different than > the masked input image? Or am I doing something else wrong? The error suggests > that the mask array is the problem. > The complete script is attached. > Many thanks for your help! > Alle Meije > # -*- coding: utf-8 -*- > """ > classify MR images from controls (CON) and patients (MCI) > """ > import os > import numpy as np > import nibabel as nb > import sklearn as sl > from sklearn.feature_selection import f_classif > from sklearn import svm > def subj_lists( rootdir='/data/a.wink/MR', starts=['CON','MCI'] ): > # make and empty list for each patient group > slists=[] > for i in range(0,len(starts)): > slists.append([]); > # add subjects to the group based on > for root, dirs, files in os.walk(rootdir): > for fname in files: > for i in range(0,len(starts)): > if fname.startswith(starts[i]): > slists[i].append(os.path.join(root,fname)) > print('finished building subject lists') > return slists > def mk_mask ( subj_lists=[] ): > # check whether mask can be loaded > mname='mask_check.nii.gz' > if os.path.isfile(mname): > msk=nb.load(mname).get_data() > # or must be built (values >20% in the subject images and >20% grey matter in the template) > else: > num_im=0; > for i in range(0,len(subj_lists)): > for fname in subj_lists[i]: > imdata=nb.load(fname).get_data() > if 'msk' not in locals(): > msk=np.absolute(imdata) > else: > msk=msk+np.absolute(imdata) > num_im+=1 > msk/=num_im > msk/=np.amax(msk) > msk[msk<.2]=0 > msk[msk>0]=1 > grey=nb.load('/usr/local/spm8/apriori/grey.nii').get_data() > grey[grey<.2]=0 > grey[grey>0]=1 > msk=msk*grey > img=nb.Nifti1Image(msk,np.eye(4)) > img.to_filename(mname) > msk=msk.astype(bool) > print('finished building mask') > return msk, mname > def load_images( subj_lists=[], mask=[] ): > # load all the images, build a matrix X or subjects (rows) * inmask-voxels (columns) > num_im=0; > for i in range(0,len(subj_lists)): > for fname in subj_lists[i]: > imdata=nb.load(fname).get_data() > num_im+=1 > print '\r' + str(num_im) +'\r', > if 'the_matrix' not in locals(): > the_matrix=imdata[mask].T > else: > the_matrix=np.vstack((the_matrix,imdata[mask].T)) > print('finished building matrix X') > return the_matrix > def main(): > # get the input data > subjlists = subj_lists() > y=np.concatenate( ([-1 for _ in range(len(subjlists[0]))], > [ 1 for _ in range(len(subjlists[1]))]) ) > mask, mas_name = mk_mask(subjlists) > X = load_images(subjlists,mask) > print "number of subjects, voxels: %d, %d" % X.shape > # select features > featureselection=sl.feature_selection.SelectKBest(f_classif, k=8000) > X_reduced=featureselection.fit_transform(X,y) > # classify > clf=svm.SVC(kernel='linear') > clf.fit(X_reduced,y) > # make discrimination map > coef=clf.coef_ > coef=featureselection.inverse_transform(coef) > map_name='weights_check.nii.gz' > wmap=np.zeros(mask.shape, dtype=X.dtype) > wmap[mask]=coef > img=nb.Nifti1Image(wmap,np.eye(4)) > img.to_filename(map_name) > if __name__ == "__main__": > main() > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From a.m.wink at gmail.com Fri May 5 06:13:08 2017 From: a.m.wink at gmail.com (Alle Meije Wink) Date: Fri, 5 May 2017 12:13:08 +0200 Subject: [scikit-learn] using a mask for brain images In-Reply-To: <20170505055926.GD2921489@phare.normalesup.org> References: <20170505055926.GD2921489@phare.normalesup.org> Message-ID: Thanks for that Gael - I do know nilearn but in this case I did a depth-first search on doing SVM on brain images and ended up here :) The size of 'coef' is (1,205739), the size of mask[mask].T is (205739,) Same number of elements, different storage layout(?). Turned out that the numpy.ravel() function -pointed out by my colleague- can solve that! >>> wmap[mask]=np.ravel(coef) bw Alle Meije On 5 May 2017 at 07:59, Gael Varoquaux wrote: > Hi Alle, > > I think that what has changed between 2014 and today is that the > coefficients coef are now a 2D array (number of hyperplanes x number of > features). In your case, the first direction is of length one, so you > could just do: > > coef = clf.coef_[0] > > and your script should work. > > The code of the Abraham paper cannot be updated, because papers are not > living objects. However, we are maintaining a package that encodes all > these patterns in higher-level construct: http://nilearn.github.io/ > It might be a good idea to use this package, as it is maintained and has > quality assurance. > > Best, > > Ga?l > > On Thu, May 04, 2017 at 05:52:27PM +0200, Alle Meije Wink wrote: > > I have a script to classify MRI perfusion maps from healthy subjects and > > patients. For the file IO and the classifier I have started with the > example > > code in Abraham et al 2014 [https://arxiv.org/pdf/1412.3919.pdf]. > > > I use the same classifier as in the paper to produce a back-projected > map of > > classification weights, which I then want to 'unmask' like in the paper: > > > coef=clf.coef_ > > coef=featureselection.inverse_transform(coef) > > > and > > > map_name='weights_check.nii.gz' > > wmap=np.zeros(mask.shape, dtype=X.dtype) > > wmap[mask]=coef > > img=nb.Nifti1Image(wmap,np.eye(4)) > > img.to_filename(map_name) > > > But the line "wmap[mask]=coef" throws an error "ValueError: boolean > index array > > should have 1 dimension". I tried the example code from the paper and > that > > works. > > > Is the 'coef' array of back-projected SVM weights in some way different > than > > the masked input image? Or am I doing something else wrong? The error > suggests > > that the mask array is the problem. > > > The complete script is attached. > > > Many thanks for your help! > > Alle Meije > > > # -*- coding: utf-8 -*- > > """ > > classify MR images from controls (CON) and patients (MCI) > > > """ > > > import os > > import numpy as np > > import nibabel as nb > > import sklearn as sl > > > from sklearn.feature_selection import f_classif > > from sklearn import svm > > > def subj_lists( rootdir='/data/a.wink/MR', starts=['CON','MCI'] ): > > > # make and empty list for each patient group > > slists=[] > > for i in range(0,len(starts)): > > slists.append([]); > > > # add subjects to the group based on > > for root, dirs, files in os.walk(rootdir): > > for fname in files: > > for i in range(0,len(starts)): > > if fname.startswith(starts[i]): > > slists[i].append(os.path.join(root,fname)) > > > print('finished building subject lists') > > return slists > > > def mk_mask ( subj_lists=[] ): > > > # check whether mask can be loaded > > mname='mask_check.nii.gz' > > > if os.path.isfile(mname): > > > msk=nb.load(mname).get_data() > > > # or must be built (values >20% in the subject images and >20% grey > matter in the template) > > else: > > > num_im=0; > > for i in range(0,len(subj_lists)): > > for fname in subj_lists[i]: > > imdata=nb.load(fname).get_data() > > if 'msk' not in locals(): > > msk=np.absolute(imdata) > > else: > > msk=msk+np.absolute(imdata) > > num_im+=1 > > > msk/=num_im > > msk/=np.amax(msk) > > msk[msk<.2]=0 > > msk[msk>0]=1 > > > grey=nb.load('/usr/local/spm8/apriori/grey.nii').get_data() > > grey[grey<.2]=0 > > grey[grey>0]=1 > > > msk=msk*grey > > > img=nb.Nifti1Image(msk,np.eye(4)) > > img.to_filename(mname) > > > msk=msk.astype(bool) > > > print('finished building mask') > > return msk, mname > > > def load_images( subj_lists=[], mask=[] ): > > > # load all the images, build a matrix X or subjects (rows) * > inmask-voxels (columns) > > num_im=0; > > for i in range(0,len(subj_lists)): > > for fname in subj_lists[i]: > > imdata=nb.load(fname).get_data() > > num_im+=1 > > print '\r' + str(num_im) +'\r', > > if 'the_matrix' not in locals(): > > the_matrix=imdata[mask].T > > else: > > the_matrix=np.vstack((the_matrix,imdata[mask].T)) > > > print('finished building matrix X') > > return the_matrix > > > def main(): > > > # get the input data > > subjlists = subj_lists() > > y=np.concatenate( ([-1 for _ in range(len(subjlists[0]))], > > [ 1 for _ in range(len(subjlists[1]))]) ) > > mask, mas_name = mk_mask(subjlists) > > X = load_images(subjlists,mask) > > print "number of subjects, voxels: %d, %d" % X.shape > > > # select features > > featureselection=sl.feature_selection.SelectKBest(f_classif, k=8000) > > X_reduced=featureselection.fit_transform(X,y) > > > # classify > > clf=svm.SVC(kernel='linear') > > clf.fit(X_reduced,y) > > > # make discrimination map > > coef=clf.coef_ > > coef=featureselection.inverse_transform(coef) > > > map_name='weights_check.nii.gz' > > wmap=np.zeros(mask.shape, dtype=X.dtype) > > wmap[mask]=coef > > img=nb.Nifti1Image(wmap,np.eye(4)) > > img.to_filename(map_name) > > > if __name__ == "__main__": > > main() > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Gael Varoquaux > Researcher, INRIA Parietal > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > Phone: ++ 33-1-69-08-79-68 > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri May 5 07:18:42 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 5 May 2017 13:18:42 +0200 Subject: [scikit-learn] using a mask for brain images In-Reply-To: References: <20170505055926.GD2921489@phare.normalesup.org> Message-ID: <20170505111842.GD4015500@phare.normalesup.org> On Fri, May 05, 2017 at 12:13:08PM +0200, Alle Meije Wink wrote: > Thanks for that Gael - I do know nilearn but in this case I did a depth-first > search on doing SVM on brain images and ended up here :) :). Darn, we need to work on our search engine optimization. Nilearn should be the easiest way of doing SVMs on nifti images. > The size of 'coef' is (1,205739), the size of mask[mask].T is (205739,) > Same number of elements, different storage layout(?). Correct. > Turned out that the numpy.ravel() function -pointed out by my colleague- can > solve that! > >>> wmap[mask]=np.ravel(coef) I suggested using coef[0] rather than ravel as ravel is more dangerous (it will flatten brutally the array). But it will work too. Cheers, Ga?l > bw > Alle Meije > On 5 May 2017 at 07:59, Gael Varoquaux wrote: > Hi Alle, > I think that what has changed between 2014 and today is that the > coefficients coef are now a 2D array (number of hyperplanes x number of > features). In your case, the first direction is of length one, so you > could just do: > coef = clf.coef_[0] > and your script should work. > The code of the Abraham paper cannot be updated, because papers are not > living objects. However, we are maintaining a package that encodes all > these patterns in higher-level construct: http://nilearn.github.io/ > It might be a good idea to use this package, as it is maintained and has > quality assurance. > Best, > Ga?l > On Thu, May 04, 2017 at 05:52:27PM +0200, Alle Meije Wink wrote: > > I have a script to classify MRI perfusion maps from healthy subjects and > > patients. For the file IO and the classifier I have started with the > example > > code in Abraham et al 2014 [https://arxiv.org/pdf/1412.3919.pdf]. > > I use the same classifier as in the paper to produce a back-projected map > of > > classification weights, which I then want to 'unmask' like in the paper: > > ??? coef=clf.coef_ > > ??? coef=featureselection.inverse_transform(coef)?????????? > > and > > ??? map_name='weights_check.nii.gz' > > ??? wmap=np.zeros(mask.shape, dtype=X.dtype) > > ??? wmap[mask]=coef > > ??? img=nb.Nifti1Image(wmap,np.eye(4)) > > ??? img.to_filename(map_name)??? > > But the line "wmap[mask]=coef" throws an error "ValueError: boolean index > array > > should have 1 dimension". I tried the example code from the paper and > that > > works. > > Is the 'coef' array of back-projected SVM weights in some way different > than > > the masked input image? Or am I doing something else wrong? The error > suggests > > that the mask array is the problem. > > The complete script is attached. > > Many thanks for your help! > > Alle Meije > > # -*- coding: utf-8 -*- > > """ > > classify MR images from controls (CON) and patients (MCI) > > """ > > import os > > import numpy as np > > import nibabel as nb > > import sklearn as sl > > from sklearn.feature_selection import f_classif > > from sklearn import svm > > def subj_lists( rootdir='/data/a.wink/MR', starts=['CON','MCI'] ): > >? ? ?# make and empty list for each patient group > >? ? ?slists=[] > >? ? ?for i in range(0,len(starts)): > >? ? ? ? ?slists.append([]); > >? ? ?# add subjects to the group based on > >? ? ?for root, dirs, files in os.walk(rootdir): > >? ? ? ? ?for fname in files: > >? ? ? ? ? ? ?for i in range(0,len(starts)): > >? ? ? ? ? ? ? ? ?if fname.startswith(starts[i]): > >? ? ? ? ? ? ? ? ? ? ?slists[i].append(os.path.join(root,fname)) > >? ? ?print('finished building subject lists') > >? ? ?return slists > > def mk_mask ( subj_lists=[] ): > >? ? ?# check whether mask can be loaded > >? ? ?mname='mask_check.nii.gz' > >? ? ?if os.path.isfile(mname): > >? ? ? ? ?msk=nb.load(mname).get_data() > >? ? ?# or must be built (values >20% in the subject images and >20% grey > matter in the template) > >? ? ?else: > >? ? ? ? ?num_im=0; > >? ? ? ? ?for i in range(0,len(subj_lists)): > >? ? ? ? ? ? ?for fname in subj_lists[i]: > >? ? ? ? ? ? ? ? ?imdata=nb.load(fname).get_data() > >? ? ? ? ? ? ? ? ?if 'msk' not in locals(): > >? ? ? ? ? ? ? ? ? ? ?msk=np.absolute(imdata) > >? ? ? ? ? ? ? ? ?else: > >? ? ? ? ? ? ? ? ? ? ?msk=msk+np.absolute(imdata) > >? ? ? ? ? ? ? ? ?num_im+=1 > >? ? ? ? ?msk/=num_im > >? ? ? ? ?msk/=np.amax(msk) > >? ? ? ? ?msk[msk<.2]=0 > >? ? ? ? ?msk[msk>0]=1 > >? ? ? ? ?grey=nb.load('/usr/local/spm8/apriori/grey.nii').get_data() > >? ? ? ? ?grey[grey<.2]=0 > >? ? ? ? ?grey[grey>0]=1 > >? ? ? ? ?msk=msk*grey > >? ? ? ? ?img=nb.Nifti1Image(msk,np.eye(4)) > >? ? ? ? ?img.to_filename(mname) > >? ? ?msk=msk.astype(bool) > >? ? ?print('finished building mask') > >? ? ?return msk, mname > > def load_images( subj_lists=[], mask=[] ): > >? ? ?# load all the images, build a matrix X or subjects (rows) * > inmask-voxels (columns) > >? ? ?num_im=0; > >? ? ?for i in range(0,len(subj_lists)): > >? ? ? ? ?for fname in subj_lists[i]: > >? ? ? ? ? ? ?imdata=nb.load(fname).get_data() > >? ? ? ? ? ? ?num_im+=1 > >? ? ? ? ? ? ?print '\r' + str(num_im) +'\r', > >? ? ? ? ? ? ?if 'the_matrix' not in locals(): > >? ? ? ? ? ? ? ? ?the_matrix=imdata[mask].T > >? ? ? ? ? ? ?else: > >? ? ? ? ? ? ? ? ?the_matrix=np.vstack((the_matrix,imdata[mask].T)) > >? ? ?print('finished building matrix X') > >? ? ?return the_matrix > > def main(): > >? ? ?# get the input data > >? ? ?subjlists = subj_lists() > >? ? ?y=np.concatenate( ([-1 for _ in range(len(subjlists[0]))], > >? ? ? ? ? ? ? ? ? ? ? ? [ 1 for _ in range(len(subjlists[1]))]) ) > >? ? ?mask, mas_name = mk_mask(subjlists) > >? ? ?X = load_images(subjlists,mask) > >? ? ?print "number of subjects, voxels: %d, %d" % X.shape > >? ? ?# select features > >? ? ?featureselection=sl.feature_selection.SelectKBest(f_classif, k=8000) > >? ? ?X_reduced=featureselection.fit_transform(X,y) > >? ? ?# classify > >? ? ?clf=svm.SVC(kernel='linear') > >? ? ?clf.fit(X_reduced,y) > >? ? ?# make discrimination map > >? ? ?coef=clf.coef_ > >? ? ?coef=featureselection.inverse_transform(coef) > >? ? ?map_name='weights_check.nii.gz' > >? ? ?wmap=np.zeros(mask.shape, dtype=X.dtype) > >? ? ?wmap[mask]=coef > >? ? ?img=nb.Nifti1Image(wmap,np.eye(4)) > >? ? ?img.to_filename(map_name) > > if __name__ == "__main__": > >? ? ?main() > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From mamunbabu2001 at gmail.com Mon May 8 05:45:26 2017 From: mamunbabu2001 at gmail.com (Mamun Rashid) Date: Mon, 8 May 2017 10:45:26 +0100 Subject: [scikit-learn] SVC data normalisation Message-ID: Hi All, I am testing two classifiers [ 1. Random forest 2. SVC with radial basis kernel ] on a data set via 5 fold cross validation. The feature matrix contains : A. 80% features are binary [ 0 or 1 ] B. 10% are integer values representing counts / occurrences. C. 10% are continuous values between different ranges. My prior understanding was that decision tree based algorithms work better on mixed data types. In this particular case I am noticing SVC is performing much better than Random forest. I Z-score normalise the data before I sent it to support vector classifier. - Binary features ( type A) are left as it it. - Integer and Continuous features are Z-score normalised [ ( feat - mean(feat) ) / sd(feat) ) . I was wondering if anyone can tell me if this normalisation approach it correct for SVC run. Thanks in advance for your help. Regards, Mamun -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbbrown at kuhp.kyoto-u.ac.jp Mon May 8 08:48:28 2017 From: jbbrown at kuhp.kyoto-u.ac.jp (Brown J.B.) Date: Mon, 8 May 2017 21:48:28 +0900 Subject: [scikit-learn] SVC data normalisation In-Reply-To: References: Message-ID: Dear Mamun, *A.* 80% features are binary [ 0 or 1 ] > *B.* 10% are integer values representing counts / occurrences. > *C.* 10% are continuous values between different ranges. > > My prior understanding was that decision tree based algorithms work better > on mixed data types. In this particular case I am noticing > SVC is performing much better than Random forest. > What does "performing better" mean in this case? How are you defining performance? A particular metric such as MCC, PPV, or NPV? Also, how is the cross-validation being done - is the data shuffled before creating train/test groups are created? Is the exact same split of training and test data per fold used for both SVC and RF? > I Z-score normalise the data before I sent it to support vector > classifier. > - Binary features ( type *A) *are left as it it. > - Integer and Continuous features are Z-score normalised [ ( feat - > mean(feat) ) / sd(feat) ) . > Normalizing your continuous values seems quite fine, but consider these aspects: --Does it make sense in the domain of your problem to Z-normalize the integral (integer-valued) descriptors/features? --For the integral values, would subtracting about the median value make more sense? This is similar to the previous consideration. --What happens to SVC if you don't normalize? --What happens to RF if you do normalize? While my various comments above are all geared toward empirical aspects and not toward theoretical aspects, picking some of them to explore is likely to help you gain practical insight on your situation/inquiry. I'm sure you already know this, but while machine learning may have some "practical guidelines for best practices", they are guidelines and not hard rules. So, again, I would recommend doing some more empirical tests and re-evaluating your situation once you have new data in hand. If you can provide a good amount of concrete data to present along with your "problem", this community is excellent at providing intelligent, helpful responses. Hope this helps. J.B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From georg.kf.heiler at gmail.com Tue May 9 11:36:45 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Tue, 09 May 2017 15:36:45 +0000 Subject: [scikit-learn] Broken c dependencies Message-ID: Hi, unfortunately, the c dependencies of my scikit-learn installation broke and I get the following error on osx: dlopen(/usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so, 2): Symbol not found: __ZdlPvm Referenced from: /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so (which was built for Mac OS X 10.12) Expected in: /usr/lib/libstdc++.6.dylib in /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so Even removing my python installation and re-installing does not seem to get this library back. Regards, Georg -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Tue May 9 11:51:40 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Tue, 9 May 2017 11:51:40 -0400 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: Message-ID: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Hi, How did you install scikit-learn, from source or via pip? Not sure since it's been a long time since I set up my macOS from scratch, but I think you need to install Xcode command line tools at least. Have you checked that it is available? E.g. Via xcode-select -p BTW does NumPy / SciPy work on your install or is it just sklearn? Best, Sebastian Sent from my iPhone > On May 9, 2017, at 11:36 AM, Georg Heiler wrote: > > Hi, > > unfortunately, the c dependencies of my scikit-learn installation broke and I get the following error on osx: > dlopen(/usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so, 2): Symbol not found: __ZdlPvm > Referenced from: /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so (which was built for Mac OS X 10.12) > Expected in: /usr/lib/libstdc++.6.dylib > in /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so > Even removing my python installation and re-installing does not seem to get this library back. > > Regards, > Georg > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.lyashevskaya at gmail.com Tue May 9 11:34:07 2017 From: o.lyashevskaya at gmail.com (Olga Lyashevska) Date: Tue, 9 May 2017 16:34:07 +0100 Subject: [scikit-learn] impurity criterion in gradient boosted regression trees Message-ID: Hi all, I am trying to understand differences in feature importance plots obtained with R package gbm and sklearn. Having compared both implementation side by side it seems that the models are fairly similar, however feature importance plots are rather distinct. R uses empirical improvement in squared error as it is described in Friedmans's "Greedy Function Approximation" paper (eq. 44, 45). sklearn (as far as I could see it in the code) uses the weighted reduction in node purity. How exactly is this calculated? Is it a gini index? Is there a reference? I found this, but I find this hard to follow: https://github.com/scikit-learn/scikit-learn/blob/fc2f24927fc37d7e42917369f17de045b14c59b5/sklearn/tree/_tree.pyx#L1056 I have also seen a post by Matthew Drury on stack exchange: https://stats.stackexchange.com/questions/162162/relative-variable-importance-for-boosting Many thanks, Olga From georg.kf.heiler at gmail.com Tue May 9 13:00:06 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Tue, 09 May 2017 17:00:06 +0000 Subject: [scikit-learn] Broken c dependencies In-Reply-To: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: I installed python via homebrew. Scikit-learn is installed via pip. Until a few days it worked nicely. I think homebrew changed or upgraded gcc and removed that c dependency. Xcode 8 is installed. I see this error only with that specific module emgm pandas seems to run fine. Regards Georg Sebastian Raschka schrieb am Di. 9. Mai 2017 um 17:52: > Hi, > How did you install scikit-learn, from source or via pip? Not sure since > it's been a long time since I set up my macOS from scratch, but I think you > need to install Xcode command line tools at least. Have you checked that it > is available? E.g. Via xcode-select -p > BTW does NumPy / SciPy work on your install or is it just sklearn? > > Best, > Sebastian > > > > Sent from my iPhone > On May 9, 2017, at 11:36 AM, Georg Heiler > wrote: > > Hi, > > unfortunately, the c dependencies of my scikit-learn installation broke > and I get the following error on osx: > > dlopen(/usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so, 2): Symbol not found: __ZdlPvm > Referenced from: /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so (which was built for Mac OS X 10.12) > Expected in: /usr/lib/libstdc++.6.dylib > in /usr/local/lib/python3.6/site-packages/sklearn/svm/libsvm.cpython-36m-darwin.so > > Even removing my python installation and re-installing does not seem to > get this library back. > > Regards, > Georg > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 9 13:15:20 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 9 May 2017 18:15:20 +0100 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Hi, On Tue, May 9, 2017 at 6:00 PM, Georg Heiler wrote: > I installed python via homebrew. Scikit-learn is installed via pip. Until a > few days it worked nicely. I think homebrew changed or upgraded gcc and > removed that c dependency. > > Xcode 8 is installed. > > I see this error only with that specific module emgm pandas seems to run > fine. Try: pip uninstall -y scikit-learn pip install --no-cache-dir scikit-learn python -c 'import sklearn.svm' Do you see something like this: Collecting scikit-learn Downloading scikit_learn-0.18.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (7.2MB) meaning that pip is installing from a binary wheel? Do you get the same error? Cheers, Matthew From georg.kf.heiler at gmail.com Tue May 9 13:27:18 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Tue, 09 May 2017 17:27:18 +0000 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Yes just like that. Even when completely removing the python library folder the error persists Meanwhile I set up a conda environment that works but I would prefer a plain pip installation. Matthew Brett schrieb am Di. 9. Mai 2017 um 19:17: > Hi, > > On Tue, May 9, 2017 at 6:00 PM, Georg Heiler > wrote: > > I installed python via homebrew. Scikit-learn is installed via pip. > Until a > > few days it worked nicely. I think homebrew changed or upgraded gcc and > > removed that c dependency. > > > > Xcode 8 is installed. > > > > I see this error only with that specific module emgm pandas seems to run > > fine. > > Try: > > pip uninstall -y scikit-learn > pip install --no-cache-dir scikit-learn > python -c 'import sklearn.svm' > > Do you see something like this: > > Collecting scikit-learn > > Downloading > scikit_learn-0.18.1-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl > (7.2MB) > > meaning that pip is installing from a binary wheel? > > Do you get the same error? > > Cheers, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 9 14:20:02 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 9 May 2017 19:20:02 +0100 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: On Tue, May 9, 2017 at 6:27 PM, Georg Heiler wrote: > Yes just like that. Hum - you shouldn't get what I got, because I was installing for Python 3.5, and there is a wheel for Python 3.5. I now see there isn't a wheel for OSX Python 3.6, so you should have got a source install. I'll set a wheel building now. Cheers, Matthew From matthew.brett at gmail.com Tue May 9 17:29:21 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 9 May 2017 22:29:21 +0100 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Hi, On Tue, May 9, 2017 at 7:20 PM, Matthew Brett wrote: > On Tue, May 9, 2017 at 6:27 PM, Georg Heiler wrote: >> Yes just like that. > > Hum - you shouldn't get what I got, because I was installing for > Python 3.5, and there is a wheel for Python 3.5. I now see there > isn't a wheel for OSX Python 3.6, so you should have got a source > install. I'll set a wheel building now. OK - done. Try this: pip uninstall -y scikit-learn # URL below should be all one line. CDN_URL=https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com pip install -f $CDN_URL scikit-learn Does that work? Cheers, Matthew From georg.kf.heiler at gmail.com Wed May 10 00:17:13 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Wed, 10 May 2017 04:17:13 +0000 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Hi Matthew, indeed, that works fine. But what was the Problem? Installation from source should have worked fine? Thank you very much! Regards, Georg Matthew Brett schrieb am Di., 9. Mai 2017 um 23:31 Uhr: > Hi, > > On Tue, May 9, 2017 at 7:20 PM, Matthew Brett > wrote: > > On Tue, May 9, 2017 at 6:27 PM, Georg Heiler > wrote: > >> Yes just like that. > > > > Hum - you shouldn't get what I got, because I was installing for > > Python 3.5, and there is a wheel for Python 3.5. I now see there > > isn't a wheel for OSX Python 3.6, so you should have got a source > > install. I'll set a wheel building now. > > OK - done. Try this: > > pip uninstall -y scikit-learn > # URL below should be all one line. > CDN_URL= > https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com > pip install -f $CDN_URL scikit-learn > > Does that work? > > Cheers, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed May 10 01:53:35 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 10 May 2017 07:53:35 +0200 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: <20170510055335.GF1025446@phare.normalesup.org> Thanks heaps Matthew for being there for the OSX builds! Ga?l On Tue, May 09, 2017 at 10:29:21PM +0100, Matthew Brett wrote: > Hi, > On Tue, May 9, 2017 at 7:20 PM, Matthew Brett wrote: > > On Tue, May 9, 2017 at 6:27 PM, Georg Heiler wrote: > >> Yes just like that. > > Hum - you shouldn't get what I got, because I was installing for > > Python 3.5, and there is a wheel for Python 3.5. I now see there > > isn't a wheel for OSX Python 3.6, so you should have got a source > > install. I'll set a wheel building now. > OK - done. Try this: > pip uninstall -y scikit-learn > # URL below should be all one line. > CDN_URL=https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com > pip install -f $CDN_URL scikit-learn > Does that work? > Cheers, > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From olivier.grisel at ensta.org Wed May 10 03:17:56 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 10 May 2017 09:17:56 +0200 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Thanks Matthew, I have uploaded your Python 3.6 wheel for MacOSX to PyPI. -- Olivier From matthew.brett at gmail.com Wed May 10 16:33:37 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 May 2017 21:33:37 +0100 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: Hi, On Wed, May 10, 2017 at 5:17 AM, Georg Heiler wrote: > Hi Matthew, > > indeed, that works fine. But what was the Problem? Installation from source > should have worked fine? Yes, it should, and I don't know what the problem is. I just compiled scikit-learn on OSX 10.11, Python.org Python 3.6, that gave me no error for python -c 'import sklearn.svm' My compile used clang, by default. I guess your compile used Homebrew gcc? Would you mind putting the output of: python3.6 setup.py develop >& dev_log.txt somewhere I can have a look? Thanks for your persistence, Matthew From georg.kf.heiler at gmail.com Wed May 10 17:06:25 2017 From: georg.kf.heiler at gmail.com (Georg Heiler) Date: Wed, 10 May 2017 21:06:25 +0000 Subject: [scikit-learn] Broken c dependencies In-Reply-To: References: <17283E2F-2E05-420D-BCEF-8F9327FE8362@gmail.com> Message-ID: I used gcc --version [?master ???] Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 8.1.0 (clang-802.0.42) Target: x86_64-apple-darwin16.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin which was installed via homebrew as brew install gcc --without-multilib The logs are /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python: can't open file 'setup.py': [Errno 2] No such file or directory But probably, I am in the wrong directory. Which setup.py file are you interested in in which directory should I run this command? Matthew Brett schrieb am Mi., 10. Mai 2017 um 22:41 Uhr: > Hi, > > On Wed, May 10, 2017 at 5:17 AM, Georg Heiler > wrote: > > Hi Matthew, > > > > indeed, that works fine. But what was the Problem? Installation from > source > > should have worked fine? > > Yes, it should, and I don't know what the problem is. > > I just compiled scikit-learn on OSX 10.11, Python.org Python 3.6, that > gave me no error for > > python -c 'import sklearn.svm' > > My compile used clang, by default. I guess your compile used Homebrew > gcc? Would you mind putting the output of: > > python3.6 setup.py develop >& dev_log.txt > > somewhere I can have a look? > > Thanks for your persistence, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmschreiber91 at gmail.com Thu May 11 19:38:13 2017 From: jmschreiber91 at gmail.com (Jacob Schreiber) Date: Thu, 11 May 2017 16:38:13 -0700 Subject: [scikit-learn] impurity criterion in gradient boosted regression trees In-Reply-To: References: Message-ID: The blog post from Matthew Drury sums it up well. The feature importance is indeed the Gini impurity. On Tue, May 9, 2017 at 8:34 AM, Olga Lyashevska wrote: > Hi all, > > I am trying to understand differences in feature importance plots obtained > with R package gbm and sklearn. Having compared both implementation side by > side it seems that the models are fairly similar, however feature > importance plots are rather distinct. > > R uses empirical improvement in squared error as it is described in > Friedmans's "Greedy Function Approximation" paper (eq. 44, 45). > > sklearn (as far as I could see it in the code) uses the weighted reduction > in node purity. How exactly is this calculated? Is it a gini index? Is > there a reference? > > I found this, but I find this hard to follow: > https://github.com/scikit-learn/scikit-learn/blob/fc2f24927f > c37d7e42917369f17de045b14c59b5/sklearn/tree/_tree.pyx#L1056 > > I have also seen a post by Matthew Drury on stack exchange: > https://stats.stackexchange.com/questions/162162/relative-va > riable-importance-for-boosting > > Many thanks, > Olga > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mamunbabu2001 at gmail.com Fri May 19 06:05:12 2017 From: mamunbabu2001 at gmail.com (Mamun Rashid) Date: Fri, 19 May 2017 11:05:12 +0100 Subject: [scikit-learn] scikit-learn Digest, Vol 14, Issue 6 In-Reply-To: References: Message-ID: <7A64080A-CA50-4235-8030-7D77127A90F1@gmail.com> Hi J.B and the list. Please accept my apology for a much delayed response. Was ill for last few days and did not access my email. Thanks for your detailed response. > What does "performing better" mean in this case? How are you defining performance? A particular metric such as MCC, PPV, or NPV? I was looking at precision recall. I have a huge class imbalance [positive class is much smaller than negative class]. So, I am testing performance of various classifiers with an increasing negative set size ( every time I am randomly selecting a larger negative set ). It seems SVC shows better performance in Precision recall space ( SVC precision recall curve is above RFC curve ). Because of the two following issues : 1. I have a major class imbalance 2. Some of my positive observations are sometimes tightly packed within negative observation clusters [ Observations from 2 dimensional PCA and tSNE plot ]. My aim is to obtain a very clean set of positive predictions as a trade-off I am happy to sacrifice some of the positive observations > Also, how is the cross-validation being done - is the data shuffled before creating train/test groups are created? Is the exact same split of training and test data per fold used for both > SVC and RF? I am currently testing it. Thanks for the suggestion. > Normalizing your continuous values seems quite fine, but consider these > aspects: > --Does it make sense in the domain of your problem to Z-normalize the integral (integer-valued) descriptors/features? > For the integral values, would subtracting about the median value make more sense? This is similar to the previous consideration. Yes. Z-score normalisation does not make much sense. Thanks for pointing it out. Currently testing it. > --What happens to SVC if you don't normalise? SVC performs quite badly. > --What happens to RF if you do normalise? This is interesting. My understating was that decision tree based algorithms does not require normalised data. I took your suggestion and tested an RFC with and without normalised data. Their result [Confusion matrix at 0.5 operating point] seems to be identical. It felt odd to me. I have only tested on a small data set. Currently running it on different data sets to see if this is persistent. Would you have expected this ? > If you can provide a good amount of concrete data to present along with your "problem", this community is excellent at providing intelligent, helpful responses. Thanks a lot for your suggestion. I will try to create some example data sets and results from the current analysis and post it as soon as possible. Thanks in advance for your help. Regards, Mamun > Today's Topics: > > 1. SVC data normalisation (Mamun Rashid) > > Message: 2 > Date: Mon, 8 May 2017 21:48:28 +0900 > From: "Brown J.B." > > Dear Mamun, > > *A.* 80% features are binary [ 0 or 1 ] >> *B.* 10% are integer values representing counts / occurrences. >> *C.* 10% are continuous values between different ranges. >> >> My prior understanding was that decision tree based algorithms work better >> on mixed data types. In this particular case I am noticing >> SVC is performing much better than Random forest. >> > > What does "performing better" mean in this case? > How are you defining performance? > A particular metric such as MCC, PPV, or NPV? > > Also, how is the cross-validation being done - is the data shuffled before > creating train/test groups are created? > Is the exact same split of training and test data per fold used for both > SVC and RF? > > >> I Z-score normalise the data before I sent it to support vector >> classifier. >> - Binary features ( type *A) *are left as it it. >> - Integer and Continuous features are Z-score normalised [ ( feat - >> mean(feat) ) / sd(feat) ) . >> > > Normalizing your continuous values seems quite fine, but consider these > aspects: > --Does it make sense in the domain of your problem to Z-normalize the > integral (integer-valued) descriptors/features? > --For the integral values, would subtracting about the median value make > more sense? This is similar to the previous consideration. > --What happens to SVC if you don't normalize? > --What happens to RF if you do normalize? > > While my various comments above are all geared toward empirical aspects and > not toward theoretical aspects, picking some of them to explore is likely > to help you gain practical insight on your situation/inquiry. > I'm sure you already know this, but while machine learning may have some > "practical guidelines for best practices", they are guidelines and not hard > rules. > So, again, I would recommend doing some more empirical tests and > re-evaluating your situation once you have new data in hand. > > If you can provide a good amount of concrete data to present along with > your "problem", this community is excellent at providing intelligent, > helpful responses. > > Hope this helps. > > J.B. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 14, Issue 6 > ******************************************* > > Date: Mon, 8 May 2017 10:45:26 +0100 > From: Mamun Rashid > Subject: [scikit-learn] SVC data normalisation > > > Hi All, > I am testing two classifiers [ 1. Random forest 2. SVC with radial basis kernel ] on a data set via 5 fold cross validation. > > The feature matrix contains : > > A. 80% features are binary [ 0 or 1 ] > B. 10% are integer values representing counts / occurrences. > C. 10% are continuous values between different ranges. > > My prior understanding was that decision tree based algorithms work better on mixed data types. In this particular case I am noticing > SVC is performing much better than Random forest. > > I Z-score normalise the data before I sent it to support vector classifier. > - Binary features ( type A) are left as it it. > - Integer and Continuous features are Z-score normalised [ ( feat - mean(feat) ) / sd(feat) ) . > > I was wondering if anyone can tell me if this normalisation approach it correct for SVC run. > > Thanks in advance for your help. > > Regards, > Mamun > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcrudy at gmail.com Fri May 19 18:10:36 2017 From: jcrudy at gmail.com (Jason Rudy) Date: Fri, 19 May 2017 15:10:36 -0700 Subject: [scikit-learn] Failing check_estimator on py-earth Message-ID: I'm pushing to get py-earth ready for a release, but I'm having an issue with the check_estimator function on 32 bit windows machines. Here is a link to the failing build on appveyor: https://ci.appveyor.com/project/jcrudy/py-earth/build/job/21r6838yh1bgwxw4 It appears that array conversion is producing some small differences that make check_estimators_data_not_an_array fail. I'll probably have to set up a 32 bit environment with a debugger and drill down to find the bug, but I'm wondering if anybody here has tips or experience that might help me guess the problem without doing that. I am pretty ignorant about numpy type standards and conversions, so even something that seems obvious to you might help me. Best, Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Fri May 19 18:22:06 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Fri, 19 May 2017 18:22:06 -0400 Subject: [scikit-learn] Failing check_estimator on py-earth In-Reply-To: References: Message-ID: <995CDFFF-64FF-4C63-9C0D-1A76ABDEB4B1@gmail.com> > I'll probably have to set up a 32 bit environment with a debugger and drill down to find the bug, Must not be a bug but can simply be due to floating point imprecision. If you checked that this is expected behavior, you could you do sth like import numpy.distutils.system_info as sysinfo if sysinfo.platform_bits == 32: numpy.testing.assert_array_almost_equal(..., precision=0) else: numpy.testing.assert_array_almost_equal(..., precision=2) or sth like that? Best, Sebastian > On May 19, 2017, at 6:10 PM, Jason Rudy wrote: > > I'm pushing to get py-earth ready for a release, but I'm having an issue with the check_estimator function on 32 bit windows machines. Here is a link to the failing build on appveyor: > > https://ci.appveyor.com/project/jcrudy/py-earth/build/job/21r6838yh1bgwxw4 > > It appears that array conversion is producing some small differences that make check_estimators_data_not_an_array fail. I'll probably have to set up a 32 bit environment with a debugger and drill down to find the bug, but I'm wondering if anybody here has tips or experience that might help me guess the problem without doing that. I am pretty ignorant about numpy type standards and conversions, so even something that seems obvious to you might help me. > > Best, > > Jason > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From jcrudy at gmail.com Fri May 19 18:33:46 2017 From: jcrudy at gmail.com (Jason Rudy) Date: Fri, 19 May 2017 15:33:46 -0700 Subject: [scikit-learn] Failing check_estimator on py-earth In-Reply-To: <995CDFFF-64FF-4C63-9C0D-1A76ABDEB4B1@gmail.com> References: <995CDFFF-64FF-4C63-9C0D-1A76ABDEB4B1@gmail.com> Message-ID: Thanks, Sebastian. I'll consider using that platform check trick to disable the test for 32 bit windows. It is a small difference, and perhaps not worth all the effort of tracking down. It's part of check_estimator, so I'd have to disable the entirety of check_estimator I think. However, testing on 32 bit windows is probably not terribly important. On Fri, May 19, 2017 at 3:22 PM, Sebastian Raschka wrote: > > I'll probably have to set up a 32 bit environment with a debugger and > drill down to find the bug, > > Must not be a bug but can simply be due to floating point imprecision. If > you checked that this is expected behavior, you could you do sth like > > import numpy.distutils.system_info as sysinfo > if sysinfo.platform_bits == 32: > numpy.testing.assert_array_almost_equal(..., precision=0) > else: > numpy.testing.assert_array_almost_equal(..., precision=2) > > or sth like that? > > Best, > Sebastian > > > On May 19, 2017, at 6:10 PM, Jason Rudy wrote: > > > > I'm pushing to get py-earth ready for a release, but I'm having an issue > with the check_estimator function on 32 bit windows machines. Here is a > link to the failing build on appveyor: > > > > https://ci.appveyor.com/project/jcrudy/py-earth/build/ > job/21r6838yh1bgwxw4 > > > > It appears that array conversion is producing some small differences > that make check_estimators_data_not_an_array fail. I'll probably have to > set up a 32 bit environment with a debugger and drill down to find the bug, > but I'm wondering if anybody here has tips or experience that might help me > guess the problem without doing that. I am pretty ignorant about numpy > type standards and conversions, so even something that seems obvious to you > might help me. > > > > Best, > > > > Jason > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From se.raschka at gmail.com Fri May 19 19:01:11 2017 From: se.raschka at gmail.com (Sebastian Raschka) Date: Fri, 19 May 2017 19:01:11 -0400 Subject: [scikit-learn] Failing check_estimator on py-earth In-Reply-To: References: <995CDFFF-64FF-4C63-9C0D-1A76ABDEB4B1@gmail.com> Message-ID: <58BBBCCD-C282-4675-9CB4-F8DD292501D7@gmail.com> Hm, I am actually not sure; could be a bug. When I see it correctly, the problem is in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1519 which could be related to the 'astype' calls in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L458 but maybe the scikit-devs know more about this. > On May 19, 2017, at 6:33 PM, Jason Rudy wrote: > > Thanks, Sebastian. I'll consider using that platform check trick to disable the test for 32 bit windows. It is a small difference, and perhaps not worth all the effort of tracking down. It's part of check_estimator, so I'd have to disable the entirety of check_estimator I think. However, testing on 32 bit windows is probably not terribly important. > > On Fri, May 19, 2017 at 3:22 PM, Sebastian Raschka > wrote: > > I'll probably have to set up a 32 bit environment with a debugger and drill down to find the bug, > > Must not be a bug but can simply be due to floating point imprecision. If you checked that this is expected behavior, you could you do sth like > > import numpy.distutils.system_info as sysinfo > if sysinfo.platform_bits == 32: > numpy.testing.assert_array_almost_equal(..., precision=0) > else: > numpy.testing.assert_array_almost_equal(..., precision=2) > > or sth like that? > > Best, > Sebastian > > > On May 19, 2017, at 6:10 PM, Jason Rudy > wrote: > > > > I'm pushing to get py-earth ready for a release, but I'm having an issue with the check_estimator function on 32 bit windows machines. Here is a link to the failing build on appveyor: > > > > https://ci.appveyor.com/project/jcrudy/py-earth/build/job/21r6838yh1bgwxw4 > > > > It appears that array conversion is producing some small differences that make check_estimators_data_not_an_array fail. I'll probably have to set up a 32 bit environment with a debugger and drill down to find the bug, but I'm wondering if anybody here has tips or experience that might help me guess the problem without doing that. I am pretty ignorant about numpy type standards and conversions, so even something that seems obvious to you might help me. > > > > Best, > > > > Jason > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From ichkoar at gmail.com Sat May 20 08:25:41 2017 From: ichkoar at gmail.com (Christos Aridas) Date: Sat, 20 May 2017 15:25:41 +0300 Subject: [scikit-learn] Failing check_estimator on py-earth In-Reply-To: References: Message-ID: We faced a similar problem between architectures (32/64bit) with a failing test in imbalanced-learn. Guillaume posted details here https ://github.com/scikit-learn/scikit-learn/issues/8853 Regards, Chris On Saturday, May 20, 2017, Jason Rudy wrote: > I'm pushing to get py-earth ready for a release, but I'm having an issue > with the check_estimator function on 32 bit windows machines. Here is a > link to the failing build on appveyor: > > https://ci.appveyor.com/project/jcrudy/py-earth/build/job/21r6838yh1bgwxw4 > > It appears that array conversion is producing some small differences that > make check_estimators_data_not_an_array fail. I'll probably have to set > up a 32 bit environment with a debugger and drill down to find the bug, but > I'm wondering if anybody here has tips or experience that might help me > guess the problem without doing that. I am pretty ignorant about numpy > type standards and conversions, so even something that seems obvious to you > might help me. > > Best, > > Jason > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alessandro.luongo at atos.net Tue May 30 08:22:30 2017 From: alessandro.luongo at atos.net (LUONGO, ALESSANDRO) Date: Tue, 30 May 2017 12:22:30 +0000 Subject: [scikit-learn] KNearestNeighbour is not running in multithread Message-ID: Hi everyone! I'm successfully using scikit-learn on a 384 core machine. I'm playing with two deployment: The first is a anaconda installation of python, which use MKL as backend of numpy, with python 3.6 The second is a "native" installation of scikit-learn and numpy, and thus the backend is based on openblas and python 3.4.5 Both implementations works, and I can see a high number of threads wigh high CPU load. (for instance when I'm doing PCA) The problem that I don't know how to debug, is that with kNearestNeighbour is using only one core. This puzzle me, since I can see that since version 0.17, the PR with the parallel KNN has been accepted into the main branch. https://github.com/scikit-learn/scikit-learn/pull/4009 , Sklearn should have merged this changes 1 year ago, and my version of sklearn is: > print('The scikit-learn version is {}.'.format(sklearn.__version__)) > The scikit-learn version is 0.18.1. Do you have any hints on how to use parallel KNN? I'm classifying a high dimensional dataset of MNIST (image digits). So I'm doing PCA to get vector of dimension 35-50, and then I'm doing a nonlinear expansion, so I'm getting vector of dimension 600-100. That's why I need parallelism so badly. clf = KNeighborsClassifier(algorithm='ball_tree') clf = clf.fit(train, train_labels) Thanks for all your amazing work. Ale -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Tue May 30 11:37:51 2017 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 30 May 2017 17:37:51 +0200 Subject: [scikit-learn] KNearestNeighbour is not running in multithread In-Reply-To: References: Message-ID: <20170530153751.GD1446293@phare.normalesup.org> You need to set the n_jobs parameter of the KNearestNeighbour object. Ga?l From shane.grigsby at colorado.edu Tue May 30 15:42:51 2017 From: shane.grigsby at colorado.edu (Shane Grigsby) Date: Tue, 30 May 2017 13:42:51 -0600 Subject: [scikit-learn] KNearestNeighbour is not running in multithread In-Reply-To: <20170530153751.GD1446293@phare.normalesup.org> References: <20170530153751.GD1446293@phare.normalesup.org> Message-ID: <20170530194251.6yed7gvrolicbzry@CIRES-iMAC.local> Also, I've found that nearest neighbors are often faster using a single core given the overhead that multiprocessing brings... If you're doing a single query over billions or more of points, parallel is faster, but if you are doing lots of neighbor queries over hundreds of thousand or a few million points, the single threaded call will be faster. ~Shane On 05/30, Gael Varoquaux wrote: >You need to set the n_jobs parameter of the KNearestNeighbour object. > >Ga?l >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn -- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* From albertthomas88 at gmail.com Wed May 31 05:27:31 2017 From: albertthomas88 at gmail.com (Albert Thomas) Date: Wed, 31 May 2017 09:27:31 +0000 Subject: [scikit-learn] develop install with pip? Message-ID: Hi all, For a develop install it is suggested in the contributing section of the website http://scikit-learn.org/stable/developers/contributing.html to do: python setup.py develop However I read on stackoverflow that the preferred way to do this is now to use pip instead of using setuptools directly: pip install -e . Thanks, Albert -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed May 31 05:37:55 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 31 May 2017 11:37:55 +0200 Subject: [scikit-learn] develop install with pip? In-Reply-To: References: Message-ID: +1 for recommending to use `pip install --editable .`. -- Olivier From albertthomas88 at gmail.com Wed May 31 17:56:09 2017 From: albertthomas88 at gmail.com (Albert Thomas) Date: Wed, 31 May 2017 21:56:09 +0000 Subject: [scikit-learn] develop install with pip? In-Reply-To: References: Message-ID: In fact `pip install --editable .` is the instruction given at the end of the Advanced installation instructions http://scikit-learn.org/stable/developers/advanced_installation.html#testing . I will submit a PR to recommend this in the Contributing section as well. Albert On Wed, May 31, 2017 at 11:39 AM Olivier Grisel wrote: > +1 for recommending to use `pip install --editable .`. > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: