From donkey-hotei at cryptolab.net Fri Jul 1 03:48:55 2016 From: donkey-hotei at cryptolab.net (donkey-hotei at cryptolab.net) Date: Fri, 01 Jul 2016 09:48:55 +0200 Subject: [scikit-learn] partial_fit implementation for IsolationForest In-Reply-To: References: Message-ID: <89984291c81434d8be46cdc4c4527b44@cryptolab.net> hi Olivier, thanks for your response. > What you describe is quite different from what sklearn models > typically do with partial_fit. partial_fit is more about out-of-core / > streaming fitting rather than true online learning with explicit > forgetting. > > In particular what you suggest would not accept calling partial_fit > with very small chunks (e.g. from tens to a hundred samples at a time) > because that would not be enough to develop deep isolation trees and > would harm the performance of the resulting isolation forest. I see, suppose I should check to see how the depth of these trees changes when fitting on small chunks as opposed to large chunks -. either way, refreshing on at least 1000 samples has proven to work O.K here in the face of concept drift > If the problem is true online learning (tracking a stream of training > data with expected shifts in its distribution) I think it's better to > devise a dedicated API that does not try to mimic the scikit-learn API > (for this specific part). There will typically have to be an > additional hyperparameter to control how much the model should > remember about old samples. ok, i've been using a parameter called 'n_more_estimators' that decides how many trees are dropped/added. maybe it is not the best way > If the problem is more about out-of-core, then partial_fit is suitable > but the trees should grow and get reorganized progressively (as > pointed by others in previous comments). maybe a name like "online_fit" would be more appropriate? it would be nice to know what exactly is meant by "reorganized" , so far ive been merely dropping the oldest trees > BTW, I would be curious to know more about the kind of anomaly > detection problem where you found IsolationForests to work well. The problem is intrusion detection at the application layer, features are parsed from http audit logs ty From basilbeirouti at gmail.com Fri Jul 1 13:01:39 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Fri, 1 Jul 2016 12:01:39 -0500 Subject: [scikit-learn] Adding BM25 to sklearn.feature_extraction.text Message-ID: Hi Joel, I'm not by my dev computer right now so I can't show you the code, but the problem is that the term frequency - f(q,D) in that wiki article - appears in both the numerator and the denominator. Also, in the denominator, you must add a scalar quantity to f(q,D), which is unsupported if f(q,D) is coming from a sparse matrix. You can factor the equation in different ways but you can't get around the main issue that the sparse matrix must appear in the numerator and denominator. So instead of doing any actual matrix multiplication I just loop on the non-zero elements in the sparse matrix (term-frequency matrix) and fill in the new BM25matrix, element by element. Any suggestions that use actual sparse matrix operations would be appreciated. I know I can also loop on the non-zero elements and construct a sparse csr_matrix from that using the.indptr attribute etc. but I'm hoping there's a way to use matrix operations. Sincerely, Basil Beirouti On Fri, Jul 1, 2016 at 11:00 AM, wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Adding BM25 to sklearn.feature_extraction.text (Update) > (Basil Beirouti) > 2. Re: Adding BM25 to sklearn.feature_extraction.text (Update) > (Joel Nothman) > 3. Re: Adding BM25 to sklearn.feature_extraction.text (Update) > (Sebastian Raschka) > 4. Re: partial_fit implementation for IsolationForest > (donkey-hotei at cryptolab.net) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 30 Jun 2016 17:23:18 -0500 > From: Basil Beirouti > To: scikit-learn at python.org > Subject: [scikit-learn] Adding BM25 to sklearn.feature_extraction.text > (Update) > Message-ID: > < > CAB4mTg8tMwoA0NwsfXmVtYWqS547F2NOmP5vj3LTCaNqXjqeWQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello everyone, > > I have successfully created a few versions of the BM25Transformer. I looked > at TFIDFTransformer for guidance and I noticed that it outputs a sparse > matrix when given a sparse termcount matrix as an input. > > Unfortunately, the fastest implementation of BM25Transformer that I have > been able to come up with does NOT output a sparse matrix, it will return a > regular numpy matrix. > > Benchmarked against the entire 20newsgroups corpus, here is how they > perform (assuming input is csr_matrix for all): > > 1.) finishes in 4 seconds, outputs a regular numpy matrix > 2.) finishes in 30 seconds, outputs a dok_matrix > 3.) finishes in 130 seconds, outputs a regular numpy matrix > > It's worth noting that using algorithm 1 and converting the output to a > sparse matrix still takes less time than 3, and takes about as long as 2. > > So my question is, how important is it that my BM25Transformer outputs a > sparse matrix? > > I'm going to try another implementation which looks directly at the data, > indices, and indptr attributes of the inputted csr_matrix. I just wanted to > check in and see what people thought. > > Sincerely, > Basil Beirouti > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160630/80852326/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Fri, 1 Jul 2016 08:38:15 +1000 > From: Joel Nothman > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Adding BM25 to > sklearn.feature_extraction.text (Update) > Message-ID: > < > CAAkaFLUB+4gu5cHuYYyc8pqBK4Ews4mkXyBKAvMCVENNPUv98Q at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I don't see what about BM25, at least as presented at > https://en.wikipedia.org/wiki/Okapi_BM25, should prevent using CSR > operations efficiently. Show us your code. > > On 1 July 2016 at 08:23, Basil Beirouti wrote: > > > Hello everyone, > > > > I have successfully created a few versions of the BM25Transformer. I > > looked at TFIDFTransformer for guidance and I noticed that it outputs a > > sparse matrix when given a sparse termcount matrix as an input. > > > > Unfortunately, the fastest implementation of BM25Transformer that I have > > been able to come up with does NOT output a sparse matrix, it will > return a > > regular numpy matrix. > > > > Benchmarked against the entire 20newsgroups corpus, here is how they > > perform (assuming input is csr_matrix for all): > > > > 1.) finishes in 4 seconds, outputs a regular numpy matrix > > 2.) finishes in 30 seconds, outputs a dok_matrix > > 3.) finishes in 130 seconds, outputs a regular numpy matrix > > > > It's worth noting that using algorithm 1 and converting the output to a > > sparse matrix still takes less time than 3, and takes about as long as 2. > > > > So my question is, how important is it that my BM25Transformer outputs a > > sparse matrix? > > > > I'm going to try another implementation which looks directly at the data, > > indices, and indptr attributes of the inputted csr_matrix. I just wanted > to > > check in and see what people thought. > > > > Sincerely, > > Basil Beirouti > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160701/de28d786/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Thu, 30 Jun 2016 18:33:49 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Adding BM25 to > sklearn.feature_extraction.text (Update) > Message-ID: > <6411ECB7-BD7C-4960-B847-B3D633DD848A at sebastianraschka.com> > Content-Type: text/plain; charset=utf-8 > > Hi, Basil, > > I?d say runtime may not be the main concern regarding sparse vs. dense. In > my opinion, the main reason to use sparse arrays would be memory useage. > I.e., text data is typically rather large (esp. high-dimensional, sparse > feature vector). So one limitation with scikit-learn is typically memory > capacity, especially if you are using multiprocessing via the cv param. > > PS: > > > regular numpy matrix > > I think you mean "numpy array?? (Since there?s a numpy matrix datastruct > in numpy as well, however, almost no one uses it) > > Best, > Sebastian > > > On Jun 30, 2016, at 6:23 PM, Basil Beirouti > wrote: > > > > Hello everyone, > > > > I have successfully created a few versions of the BM25Transformer. I > looked at TFIDFTransformer for guidance and I noticed that it outputs a > sparse matrix when given a sparse termcount matrix as an input. > > > > Unfortunately, the fastest implementation of BM25Transformer that I have > been able to come up with does NOT output a sparse matrix, it will return a > regular numpy matrix. > > > > Benchmarked against the entire 20newsgroups corpus, here is how they > perform (assuming input is csr_matrix for all): > > > > 1.) finishes in 4 seconds, outputs a regular numpy matrix > > 2.) finishes in 30 seconds, outputs a dok_matrix > > 3.) finishes in 130 seconds, outputs a regular numpy matrix > > > > It's worth noting that using algorithm 1 and converting the output to a > sparse matrix still takes less time than 3, and takes about as long as 2. > > > > So my question is, how important is it that my BM25Transformer outputs a > sparse matrix? > > > > I'm going to try another implementation which looks directly at the > data, indices, and indptr attributes of the inputted csr_matrix. I just > wanted to check in and see what people thought. > > > > Sincerely, > > Basil Beirouti > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Message: 4 > Date: Fri, 01 Jul 2016 09:48:55 +0200 > From: donkey-hotei at cryptolab.net > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] partial_fit implementation for > IsolationForest > Message-ID: <89984291c81434d8be46cdc4c4527b44 at cryptolab.net> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > hi Olivier, > > thanks for your response. > > > What you describe is quite different from what sklearn models > > typically do with partial_fit. partial_fit is more about out-of-core / > > streaming fitting rather than true online learning with explicit > > forgetting. > > > > In particular what you suggest would not accept calling partial_fit > > with very small chunks (e.g. from tens to a hundred samples at a time) > > because that would not be enough to develop deep isolation trees and > > would harm the performance of the resulting isolation forest. > > I see, suppose I should check to see how the depth of these trees > changes when fitting on small chunks as opposed to large chunks -. > either way, refreshing on at least 1000 samples has proven to work O.K > here in the face of concept drift > > > If the problem is true online learning (tracking a stream of training > > data with expected shifts in its distribution) I think it's better to > > devise a dedicated API that does not try to mimic the scikit-learn API > > (for this specific part). There will typically have to be an > > additional hyperparameter to control how much the model should > > remember about old samples. > > ok, i've been using a parameter called 'n_more_estimators' that decides > how many trees are dropped/added. maybe it is not the best way > > > If the problem is more about out-of-core, then partial_fit is suitable > > but the trees should grow and get reorganized progressively (as > > pointed by others in previous comments). > > maybe a name like "online_fit" would be more appropriate? it would be > nice to know what exactly is meant by "reorganized" , so far ive been > merely dropping the oldest trees > > > BTW, I would be curious to know more about the kind of anomaly > > detection problem where you found IsolationForests to work well. > > The problem is intrusion detection at the application layer, features > are parsed from http audit logs > > ty > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 1 > ****************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From basilbeirouti at gmail.com Fri Jul 1 17:17:43 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Fri, 1 Jul 2016 16:17:43 -0500 Subject: [scikit-learn] Adding BM25 to scikit-learn.feature_extraction.text Message-ID: Hi everyone, to put it succinctly, here's the BM25 equation: f(w,D) * (k+1) / (k*B + f(w,D)) where w is the word, and D is the document (corresponding to rows and columns, respectively). f is a sparse matrix because only a fraction of the whole vocabulary of words appears in any given single document. B is a function of only the document, but it doesn't matter, you can think of it as a constant if you want. The problem is since f(w,D) is almost always zero, I only need to do the calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when f(w,D) is not zero. Is there a clever way to do this with masks? You can refactor the above equation to get this: (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a denominator, which is bad (because of dividing by zero). So anyway, currently I am converting to a coo_matrix and iterator through the non-zero values like this: cx = x.tocoo() for i,j,v in itertools.izip(cx.row, cx.col, cx.data): (i,j,v) That iterator is incredibly fast, but unfortunately coo_matrix does not support assignment. So I create a new copy of either a dok sparse matrix or a regular numpy array and assign to that. I could also deal directly with the .data, .indptr, and indices attributes of csr_matrix, and see if it's possible to create a copy of .data attribute and update the values accordingly. I was hoping somebody had encountered this type of issue before. Sincerely, Basil Beirouti -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Fri Jul 1 17:35:49 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Fri, 01 Jul 2016 17:35:49 -0400 Subject: [scikit-learn] Adding BM25 to scikit-learn.feature_extraction.text In-Reply-To: References: Message-ID: Hi Basil, If B were just a constant, you could do the whole thing as a vectorized operation on X.data. Since I understand B is a n_samples vector, I think the cleanest way to compute the denominator is using sklearn.utils.sparsefuncs.inplace_row_scale. Hope this helps, Vlad On July 1, 2016 5:17:43 PM EDT, Basil Beirouti wrote: >Hi everyone, > >to put it succinctly, here's the BM25 equation: > >f(w,D) * (k+1) / (k*B + f(w,D)) > >where w is the word, and D is the document (corresponding to rows and >columns, respectively). f is a sparse matrix because only a fraction of >the >whole vocabulary of words appears in any given single document. > >B is a function of only the document, but it doesn't matter, you can >think >of it as a constant if you want. > >The problem is since f(w,D) is almost always zero, I only need to do >the >calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when >f(w,D) is not zero. Is there a clever way to do this with masks? > >You can refactor the above equation to get this: > >(k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >denominator, which is bad (because of dividing by zero). > >So anyway, currently I am converting to a coo_matrix and iterator >through >the non-zero values like this: > > cx = x.tocoo() > for i,j,v in itertools.izip(cx.row, cx.col, cx.data): > (i,j,v) > > >That iterator is incredibly fast, but unfortunately coo_matrix does >not support assignment. So I create a new copy of either a dok sparse >matrix or a regular numpy array and assign to that. > >I could also deal directly with the .data, .indptr, and indices >attributes of csr_matrix, and see if it's possible to create a copy of >.data attribute and update the values accordingly. I was hoping >somebody had encountered this type of issue before. > >Sincerely, > >Basil Beirouti > > >------------------------------------------------------------------------ > >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From basilbeirouti at gmail.com Fri Jul 1 18:27:41 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Fri, 1 Jul 2016 17:27:41 -0500 Subject: [scikit-learn] Bm25 Message-ID: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> Hi Vlad, Thanks for the quick reply. Unfortunately there's still the question of adding a scalar to every element in sparse matrix, which is not allowed for sparse matrices, and which is not possible to avoid in the equation. Sincerely, Basil Beirouti > On Jul 1, 2016, at 4:36 PM, scikit-learn-request at python.org wrote: > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Adding BM25 to scikit-learn.feature_extraction.text > (Basil Beirouti) > 2. Re: Adding BM25 to scikit-learn.feature_extraction.text > (Vlad Niculae) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 1 Jul 2016 16:17:43 -0500 > From: Basil Beirouti > To: scikit-learn at python.org > Subject: [scikit-learn] Adding BM25 to > scikit-learn.feature_extraction.text > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi everyone, > > to put it succinctly, here's the BM25 equation: > > f(w,D) * (k+1) / (k*B + f(w,D)) > > where w is the word, and D is the document (corresponding to rows and > columns, respectively). f is a sparse matrix because only a fraction of the > whole vocabulary of words appears in any given single document. > > B is a function of only the document, but it doesn't matter, you can think > of it as a constant if you want. > > The problem is since f(w,D) is almost always zero, I only need to do the > calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when > f(w,D) is not zero. Is there a clever way to do this with masks? > > You can refactor the above equation to get this: > > (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a > denominator, which is bad (because of dividing by zero). > > So anyway, currently I am converting to a coo_matrix and iterator through > the non-zero values like this: > > cx = x.tocoo() > for i,j,v in itertools.izip(cx.row, cx.col, cx.data): > (i,j,v) > > > That iterator is incredibly fast, but unfortunately coo_matrix does > not support assignment. So I create a new copy of either a dok sparse > matrix or a regular numpy array and assign to that. > > I could also deal directly with the .data, .indptr, and indices > attributes of csr_matrix, and see if it's possible to create a copy of > .data attribute and update the values accordingly. I was hoping > somebody had encountered this type of issue before. > > Sincerely, > > Basil Beirouti > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Fri, 01 Jul 2016 17:35:49 -0400 > From: Vlad Niculae > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Adding BM25 to > scikit-learn.feature_extraction.text > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Hi Basil, > > If B were just a constant, you could do the whole thing as a vectorized operation on X.data. > > Since I understand B is a n_samples vector, I think the cleanest way to compute the denominator is using sklearn.utils.sparsefuncs.inplace_row_scale. > > Hope this helps, > > Vlad > > >> On July 1, 2016 5:17:43 PM EDT, Basil Beirouti wrote: >> Hi everyone, >> >> to put it succinctly, here's the BM25 equation: >> >> f(w,D) * (k+1) / (k*B + f(w,D)) >> >> where w is the word, and D is the document (corresponding to rows and >> columns, respectively). f is a sparse matrix because only a fraction of >> the >> whole vocabulary of words appears in any given single document. >> >> B is a function of only the document, but it doesn't matter, you can >> think >> of it as a constant if you want. >> >> The problem is since f(w,D) is almost always zero, I only need to do >> the >> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when >> f(w,D) is not zero. Is there a clever way to do this with masks? >> >> You can refactor the above equation to get this: >> >> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >> denominator, which is bad (because of dividing by zero). >> >> So anyway, currently I am converting to a coo_matrix and iterator >> through >> the non-zero values like this: >> >> cx = x.tocoo() >> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >> (i,j,v) >> >> >> That iterator is incredibly fast, but unfortunately coo_matrix does >> not support assignment. So I create a new copy of either a dok sparse >> matrix or a regular numpy array and assign to that. >> >> I could also deal directly with the .data, .indptr, and indices >> attributes of csr_matrix, and see if it's possible to create a copy of >> .data attribute and update the values accordingly. I was hoping >> somebody had encountered this type of issue before. >> >> Sincerely, >> >> Basil Beirouti >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 3 > ****************************************** From zephyr14 at gmail.com Fri Jul 1 18:36:42 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Fri, 01 Jul 2016 18:36:42 -0400 Subject: [scikit-learn] Bm25 In-Reply-To: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> References: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> Message-ID: <39EBD035-F8B0-443D-B11F-7A0867CC3835@gmail.com> In the denominator you mean? It looks like you only need to add that to nonzero elements, since the others would all have a 0 in the numerator, right? So the final value would be zero there. Or am I missing something? You can initialize an array with the same sparsity pattern as X, but its data is k everywhere. Then use inplace_row_scale to multiply it by B, then add this to X to get the denominator. On July 1, 2016 6:27:41 PM EDT, Basil Beirouti wrote: >Hi Vlad, > >Thanks for the quick reply. Unfortunately there's still the question of >adding a scalar to every element in sparse matrix, which is not allowed >for sparse matrices, and which is not possible to avoid in the >equation. > >Sincerely, >Basil Beirouti > > >> On Jul 1, 2016, at 4:36 PM, scikit-learn-request at python.org wrote: >> >> Send scikit-learn mailing list submissions to >> scikit-learn at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> or, via email, send a message with subject or body 'help' to >> scikit-learn-request at python.org >> >> You can reach the person managing the list at >> scikit-learn-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. Adding BM25 to scikit-learn.feature_extraction.text >> (Basil Beirouti) >> 2. Re: Adding BM25 to scikit-learn.feature_extraction.text >> (Vlad Niculae) >> >> >> >---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 1 Jul 2016 16:17:43 -0500 >> From: Basil Beirouti >> To: scikit-learn at python.org >> Subject: [scikit-learn] Adding BM25 to >> scikit-learn.feature_extraction.text >> Message-ID: >> > >> Content-Type: text/plain; charset="utf-8" >> >> Hi everyone, >> >> to put it succinctly, here's the BM25 equation: >> >> f(w,D) * (k+1) / (k*B + f(w,D)) >> >> where w is the word, and D is the document (corresponding to rows and >> columns, respectively). f is a sparse matrix because only a fraction >of the >> whole vocabulary of words appears in any given single document. >> >> B is a function of only the document, but it doesn't matter, you can >think >> of it as a constant if you want. >> >> The problem is since f(w,D) is almost always zero, I only need to do >the >> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >> f(w,D) is not zero. Is there a clever way to do this with masks? >> >> You can refactor the above equation to get this: >> >> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >> denominator, which is bad (because of dividing by zero). >> >> So anyway, currently I am converting to a coo_matrix and iterator >through >> the non-zero values like this: >> >> cx = x.tocoo() >> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >> (i,j,v) >> >> >> That iterator is incredibly fast, but unfortunately coo_matrix does >> not support assignment. So I create a new copy of either a dok sparse >> matrix or a regular numpy array and assign to that. >> >> I could also deal directly with the .data, .indptr, and indices >> attributes of csr_matrix, and see if it's possible to create a copy >of >> .data attribute and update the values accordingly. I was hoping >> somebody had encountered this type of issue before. >> >> Sincerely, >> >> Basil Beirouti >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > >> >> ------------------------------ >> >> Message: 2 >> Date: Fri, 01 Jul 2016 17:35:49 -0400 >> From: Vlad Niculae >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] Adding BM25 to >> scikit-learn.feature_extraction.text >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Hi Basil, >> >> If B were just a constant, you could do the whole thing as a >vectorized operation on X.data. >> >> Since I understand B is a n_samples vector, I think the cleanest way >to compute the denominator is using >sklearn.utils.sparsefuncs.inplace_row_scale. >> >> Hope this helps, >> >> Vlad >> >> >>> On July 1, 2016 5:17:43 PM EDT, Basil Beirouti > wrote: >>> Hi everyone, >>> >>> to put it succinctly, here's the BM25 equation: >>> >>> f(w,D) * (k+1) / (k*B + f(w,D)) >>> >>> where w is the word, and D is the document (corresponding to rows >and >>> columns, respectively). f is a sparse matrix because only a fraction >of >>> the >>> whole vocabulary of words appears in any given single document. >>> >>> B is a function of only the document, but it doesn't matter, you can >>> think >>> of it as a constant if you want. >>> >>> The problem is since f(w,D) is almost always zero, I only need to do >>> the >>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >>> f(w,D) is not zero. Is there a clever way to do this with masks? >>> >>> You can refactor the above equation to get this: >>> >>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >>> denominator, which is bad (because of dividing by zero). >>> >>> So anyway, currently I am converting to a coo_matrix and iterator >>> through >>> the non-zero values like this: >>> >>> cx = x.tocoo() >>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>> (i,j,v) >>> >>> >>> That iterator is incredibly fast, but unfortunately coo_matrix does >>> not support assignment. So I create a new copy of either a dok >sparse >>> matrix or a regular numpy array and assign to that. >>> >>> I could also deal directly with the .data, .indptr, and indices >>> attributes of csr_matrix, and see if it's possible to create a copy >of >>> .data attribute and update the values accordingly. I was hoping >>> somebody had encountered this type of issue before. >>> >>> Sincerely, >>> >>> Basil Beirouti >>> >>> >>> >------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 4, Issue 3 >> ****************************************** >_______________________________________________ >scikit-learn mailing list >scikit-learn at python.org >https://mail.python.org/mailman/listinfo/scikit-learn -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From basilbeirouti at gmail.com Fri Jul 1 18:47:40 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Fri, 1 Jul 2016 17:47:40 -0500 Subject: [scikit-learn] Bm25 In-Reply-To: <39EBD035-F8B0-443D-B11F-7A0867CC3835@gmail.com> References: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> <39EBD035-F8B0-443D-B11F-7A0867CC3835@gmail.com> Message-ID: <543CA661-214B-4BF0-B4C1-CAD77A949ACF@gmail.com> Oh yes that's exactly what I was looking for. So how do I initialize an array with the same sparsity pattern as X? And then how do I do an element wise divide of the numerator over the denominator, when both are sparse matrices? Like you said it should only do this operation on the non zero elements of the numerator. Sent from my iPhone > On Jul 1, 2016, at 5:36 PM, Vlad Niculae wrote: > > In the denominator you mean? It looks like you only need to add that to nonzero elements, since the others would all have a 0 in the numerator, right? So the final value would be zero there. Or am I missing something? > > You can initialize an array with the same sparsity pattern as X, but its data is k everywhere. Then use inplace_row_scale to multiply it by B, then add this to X to get the denominator. > >> On July 1, 2016 6:27:41 PM EDT, Basil Beirouti wrote: >> Hi Vlad, >> >> Thanks for the quick reply. Unfortunately there's still the question of adding a scalar to every element in sparse matrix, which is not allowed for sparse matrices, and which is not possible to avoid in the equation. >> >> Sincerely, >> Basil Beirouti >> >> >>> On Jul 1, 2016, at 4:36 PM, scikit-learn-request at python.org wrote: >>> >>> Send scikit-learn mailing list submissions to >>> scikit-learn at python.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> or, via email, send a message with subject or body 'help' to >>> scikit-learn-request at python.org >>> >>> You can reach the person managing the list at >>> >>> scikit-learn-owner at python.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of scikit-learn digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Adding BM25 to scikit-learn.feature_extraction.text >>> (Basil Beirouti) >>> 2. Re: Adding BM25 to scikit-learn.feature_extraction.text >>> (Vlad Niculae) >>> >>> >>> >>> >>> Message: 1 >>> Date: Fri, 1 Jul 2016 16:17:43 -0500 >>> From: Basil Beirouti >>> To: scikit-learn at python.org >>> Subject: [scikit-learn] Adding BM25 to >>> scikit-learn.feature_extraction.text >>> Message-ID: >>> >>> Content-Type: text/plain; charset="utf-8" >>> >>> Hi everyone, >>> >>> to put it succinctly, here's the BM25 equation: >>> >>> f(w,D) * (k+1) / (k*B + f(w,D)) >>> >>> where w is the word, and D is the >>> document (corresponding to rows and >>> columns, respectively). f is a sparse matrix because only a fraction of the >>> whole vocabulary of words appears in any given single document. >>> >>> B is a function of only the document, but it doesn't matter, you can think >>> of it as a constant if you want. >>> >>> The problem is since f(w,D) is almost always zero, I only need to do the >>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when >>> f(w,D) is not zero. Is there a clever way to do this with masks? >>> >>> You can refactor the above equation to get this: >>> >>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >>> denominator, which is bad (because of dividing by zero). >>> >>> So anyway, currently I am converting to a coo_matrix and iterator through >>> the non-zero values like this: >>> >>> cx = x.tocoo() >>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>> (i,j,v) >>> >>> >>> That iterator is incredibly fast, but unfortunately coo_matrix does >>> not support assignment. So I create a new copy of either a dok sparse >>> matrix or a regular numpy array and assign to that. >>> >>> I could also deal directly with the .data, .indptr, and indices >>> attributes of csr_matrix, and see if it's possible to create a copy of >>> .data attribute and update the values accordingly. I was hoping >>> somebody had encountered this type of issue before. >>> >>> Sincerely, >>> >>> Basil Beirouti >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: >>> >>> >>> >>> Message: 2 >>> Date: Fri, 01 Jul 2016 17:35:49 -0400 >>> From: Vlad Niculae >>> >>> To: Scikit-learn user and developer mailing list >>> >>> Subject: Re: [scikit-learn] Adding BM25 to >>> scikit-learn.feature_extraction.text >>> Message-ID: >>> Content-Type: text/plain; charset="utf-8" >>> >>> Hi Basil, >>> >>> If B were just a constant, you could do the whole thing as a vectorized operation on X.data. >>> >>> Since I understand B is a n_samples vector, I think the cleanest way to compute the denominator is using sklearn.utils.sparsefuncs.inplace_row_scale. >>> >>> Hope this helps, >>> >>> Vlad >>> >>> >>>> On July 1, 2016 5:17:43 PM EDT, Basil Beirouti wrote: >>>> Hi everyone, >>>> >>>> to put it succinctly, here's the BM25 equation: >>>> >>>> >>>> f(w,D) * (k+1) / (k*B + f(w,D)) >>>> >>>> where w is the word, and D is the document (corresponding to rows and >>>> columns, respectively). f is a sparse matrix because only a fraction of >>>> the >>>> whole vocabulary of words appears in any given single document. >>>> >>>> B is a function of only the document, but it doesn't matter, you can >>>> think >>>> of it as a constant if you want. >>>> >>>> The problem is since f(w,D) is almost always zero, I only need to do >>>> the >>>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) when >>>> f(w,D) is not zero. Is there a clever way to do this with masks? >>>> >>>> You can refactor the above equation to get this: >>>> >>>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in a >>>> denominator, which is bad (because of dividing by zero). >>>> >>>> So anyway, currently I am converting to a coo_matrix and iterator >>>> through >>>> the non-zero values like this: >>>> >>>> >>>> cx = x.tocoo() >>>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>>> (i,j,v) >>>> >>>> >>>> That iterator is incredibly fast, but unfortunately coo_matrix does >>>> not support assignment. So I create a new copy of either a dok sparse >>>> matrix or a regular numpy array and assign to that. >>>> >>>> I could also deal directly with the .data, .indptr, and indices >>>> attributes of csr_matrix, and see if it's possible to create a copy of >>>> .data attribute and update the values accordingly. I was hoping >>>> somebody had encountered this type of issue before. >>>> >>>> Sincerely, >>>> >>>> Basil Beirouti >>>> >>>> >>>> >>>> >>>> >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> -- >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: >>> >>> >>> >>> Subject: Digest Footer >>> >>> >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> >>> End of scikit-learn Digest, Vol 4, Issue 3 >>> ****************************************** >> >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Fri Jul 1 21:53:28 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Fri, 01 Jul 2016 21:53:28 -0400 Subject: [scikit-learn] Bm25 In-Reply-To: <543CA661-214B-4BF0-B4C1-CAD77A949ACF@gmail.com> References: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> <39EBD035-F8B0-443D-B11F-7A0867CC3835@gmail.com> <543CA661-214B-4BF0-B4C1-CAD77A949ACF@gmail.com> Message-ID: For the first question, look up the possible ways to construct scipy.sparse.csr_matrix objects; one of them will take (data, indices, indptr). Just pass a new array for data, and take the latter two from X. For the second question, you can just do the elementwise operation in place on the data array, since they have the same shape in this case. You can try playing around with these operations in a notebook and benchmarking them with %timeit/%memit, to see how to best organize them. I find such exercises very rewarding. Cheers, Vlad On July 1, 2016 6:47:40 PM EDT, Basil Beirouti wrote: >Oh yes that's exactly what I was looking for. So how do I initialize an >array with the same sparsity pattern as X? And then how do I do an >element wise divide of the numerator over the denominator, when both >are sparse matrices? Like you said it should only do this operation on >the non zero elements of the numerator. > >Sent from my iPhone > >> On Jul 1, 2016, at 5:36 PM, Vlad Niculae wrote: >> >> In the denominator you mean? It looks like you only need to add that >to nonzero elements, since the others would all have a 0 in the >numerator, right? So the final value would be zero there. Or am I >missing something? >> >> You can initialize an array with the same sparsity pattern as X, but >its data is k everywhere. Then use inplace_row_scale to multiply it by >B, then add this to X to get the denominator. >> >>> On July 1, 2016 6:27:41 PM EDT, Basil Beirouti > wrote: >>> Hi Vlad, >>> >>> Thanks for the quick reply. Unfortunately there's still the question >of adding a scalar to every element in sparse matrix, which is not >allowed for sparse matrices, and which is not possible to avoid in the >equation. >>> >>> Sincerely, >>> Basil Beirouti >>> >>> >>>> On Jul 1, 2016, at 4:36 PM, scikit-learn-request at python.org wrote: >>>> >>>> Send scikit-learn mailing list submissions to >>>> scikit-learn at python.org >>>> >>>> To subscribe or unsubscribe via the World Wide Web, visit >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> or, via email, send a message with subject or body 'help' to >>>> scikit-learn-request at python.org >>>> >>>> You can reach the person managing the list at >>>> >>>> scikit-learn-owner at python.org >>>> >>>> When replying, please edit your Subject line so it is more >specific >>>> than "Re: Contents of scikit-learn digest..." >>>> >>>> >>>> Today's Topics: >>>> >>>> 1. Adding BM25 to scikit-learn.feature_extraction.text >>>> (Basil Beirouti) >>>> 2. Re: Adding BM25 to scikit-learn.feature_extraction.text >>>> (Vlad Niculae) >>>> >>>> >>>> >>>> >>>> Message: 1 >>>> Date: Fri, 1 Jul 2016 16:17:43 -0500 >>>> From: Basil Beirouti >>>> To: scikit-learn at python.org >>>> Subject: [scikit-learn] Adding BM25 to >>>> scikit-learn.feature_extraction.text >>>> Message-ID: >>>> > >>>> Content-Type: text/plain; charset="utf-8" >>>> >>>> Hi everyone, >>>> >>>> to put it succinctly, here's the BM25 equation: >>>> >>>> f(w,D) * (k+1) / (k*B + f(w,D)) >>>> >>>> where w is the word, and D is the >>>> document (corresponding to rows and >>>> columns, respectively). f is a sparse matrix because only a >fraction of the >>>> whole vocabulary of words appears in any given single document. >>>> >>>> B is a function of only the document, but it doesn't matter, you >can think >>>> of it as a constant if you want. >>>> >>>> The problem is since f(w,D) is almost always zero, I only need to >do the >>>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >>>> f(w,D) is not zero. Is there a clever way to do this with masks? >>>> >>>> You can refactor the above equation to get this: >>>> >>>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in >a >>>> denominator, which is bad (because of dividing by zero). >>>> >>>> So anyway, currently I am converting to a coo_matrix and iterator >through >>>> the non-zero values like this: >>>> >>>> cx = x.tocoo() >>>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>>> (i,j,v) >>>> >>>> >>>> That iterator is incredibly fast, but unfortunately coo_matrix >does >>>> not support assignment. So I create a new copy of either a dok >sparse >>>> matrix or a regular numpy array and assign to that. >>>> >>>> I could also deal directly with the .data, .indptr, and indices >>>> attributes of csr_matrix, and see if it's possible to create a >copy of >>>> .data attribute and update the values accordingly. I was hoping >>>> somebody had encountered this type of issue before. >>>> >>>> Sincerely, >>>> >>>> Basil Beirouti >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: > >>>> >>>> >>>> >>>> Message: 2 >>>> Date: Fri, 01 Jul 2016 17:35:49 -0400 >>>> From: Vlad Niculae >>>> >>>> To: Scikit-learn user and developer mailing list >>>> >>>> Subject: Re: [scikit-learn] Adding BM25 to >>>> scikit-learn.feature_extraction.text >>>> Message-ID: >>>> Content-Type: text/plain; charset="utf-8" >>>> >>>> Hi Basil, >>>> >>>> If B were just a constant, you could do the whole thing as a >vectorized operation on X.data. >>>> >>>> Since I understand B is a n_samples vector, I think the cleanest >way to compute the denominator is using >sklearn.utils.sparsefuncs.inplace_row_scale. >>>> >>>> Hope this helps, >>>> >>>> Vlad >>>> >>>> >>>>> On July 1, 2016 5:17:43 PM EDT, Basil Beirouti > wrote: >>>>> Hi everyone, >>>>> >>>>> to put it succinctly, here's the BM25 equation: >>>>> >>>>> >>>>> f(w,D) * (k+1) / (k*B + f(w,D)) >>>>> >>>>> where w is the word, and D is the document (corresponding to rows >and >>>>> columns, respectively). f is a sparse matrix because only a >fraction of >>>>> the >>>>> whole vocabulary of words appears in any given single document. >>>>> >>>>> B is a function of only the document, but it doesn't matter, you >can >>>>> think >>>>> of it as a constant if you want. >>>>> >>>>> The problem is since f(w,D) is almost always zero, I only need to >do >>>>> the >>>>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >>>>> f(w,D) is not zero. Is there a clever way to do this with masks? >>>>> >>>>> You can refactor the above equation to get this: >>>>> >>>>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in >a >>>>> denominator, which is bad (because of dividing by zero). >>>>> >>>>> So anyway, currently I am converting to a coo_matrix and iterator >>>>> through >>>>> the non-zero values like this: >>>>> >>>>> >>>>> cx = x.tocoo() >>>>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>>>> (i,j,v) >>>>> >>>>> >>>>> That iterator is incredibly fast, but unfortunately coo_matrix >does >>>>> not support assignment. So I create a new copy of either a dok >sparse >>>>> matrix or a regular numpy array and assign to that. >>>>> >>>>> I could also deal directly with the .data, .indptr, and indices >>>>> attributes of csr_matrix, and see if it's possible to create a >copy of >>>>> .data attribute and update the values accordingly. I was hoping >>>>> somebody had encountered this type of issue before. >>>>> >>>>> Sincerely, >>>>> >>>>> Basil Beirouti >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> -- >>>> Sent from my Android device with K-9 Mail. Please excuse my >brevity. >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: > >>>> >>>> >>>> >>>> Subject: Digest Footer >>>> >>>> >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> >>>> End of scikit-learn Digest, Vol 4, Issue 3 >>>> ****************************************** >>> >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zephyr14 at gmail.com Fri Jul 1 19:10:16 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Fri, 01 Jul 2016 19:10:16 -0400 Subject: [scikit-learn] Bm25 In-Reply-To: <543CA661-214B-4BF0-B4C1-CAD77A949ACF@gmail.com> References: <8855B69A-30FD-47D2-9302-2456AADC99B5@gmail.com> <39EBD035-F8B0-443D-B11F-7A0867CC3835@gmail.com> <543CA661-214B-4BF0-B4C1-CAD77A949ACF@gmail.com> Message-ID: For the first question, look up the possible ways to construct scipy.sparse.csr_matrix objects; one of them will take (data, indices, indptr). Just pass a new array for data, and take the latter two from X. For the second question, you can just do the elementwise operation in place on the data array, since they have the same shape in this case. You can try playing around with these operations in a notebook and benchmarking them with %timeit/%memit, to see how to best organize them. I find such exercises very rewarding. Cheers, Vlad On July 1, 2016 6:47:40 PM EDT, Basil Beirouti wrote: >Oh yes that's exactly what I was looking for. So how do I initialize an >array with the same sparsity pattern as X? And then how do I do an >element wise divide of the numerator over the denominator, when both >are sparse matrices? Like you said it should only do this operation on >the non zero elements of the numerator. > >Sent from my iPhone > >> On Jul 1, 2016, at 5:36 PM, Vlad Niculae wrote: >> >> In the denominator you mean? It looks like you only need to add that >to nonzero elements, since the others would all have a 0 in the >numerator, right? So the final value would be zero there. Or am I >missing something? >> >> You can initialize an array with the same sparsity pattern as X, but >its data is k everywhere. Then use inplace_row_scale to multiply it by >B, then add this to X to get the denominator. >> >>> On July 1, 2016 6:27:41 PM EDT, Basil Beirouti > wrote: >>> Hi Vlad, >>> >>> Thanks for the quick reply. Unfortunately there's still the question >of adding a scalar to every element in sparse matrix, which is not >allowed for sparse matrices, and which is not possible to avoid in the >equation. >>> >>> Sincerely, >>> Basil Beirouti >>> >>> >>>> On Jul 1, 2016, at 4:36 PM, scikit-learn-request at python.org wrote: >>>> >>>> Send scikit-learn mailing list submissions to >>>> scikit-learn at python.org >>>> >>>> To subscribe or unsubscribe via the World Wide Web, visit >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> or, via email, send a message with subject or body 'help' to >>>> scikit-learn-request at python.org >>>> >>>> You can reach the person managing the list at >>>> >>>> scikit-learn-owner at python.org >>>> >>>> When replying, please edit your Subject line so it is more >specific >>>> than "Re: Contents of scikit-learn digest..." >>>> >>>> >>>> Today's Topics: >>>> >>>> 1. Adding BM25 to scikit-learn.feature_extraction.text >>>> (Basil Beirouti) >>>> 2. Re: Adding BM25 to scikit-learn.feature_extraction.text >>>> (Vlad Niculae) >>>> >>>> >>>> >>>> >>>> Message: 1 >>>> Date: Fri, 1 Jul 2016 16:17:43 -0500 >>>> From: Basil Beirouti >>>> To: scikit-learn at python.org >>>> Subject: [scikit-learn] Adding BM25 to >>>> scikit-learn.feature_extraction.text >>>> Message-ID: >>>> > >>>> Content-Type: text/plain; charset="utf-8" >>>> >>>> Hi everyone, >>>> >>>> to put it succinctly, here's the BM25 equation: >>>> >>>> f(w,D) * (k+1) / (k*B + f(w,D)) >>>> >>>> where w is the word, and D is the >>>> document (corresponding to rows and >>>> columns, respectively). f is a sparse matrix because only a >fraction of the >>>> whole vocabulary of words appears in any given single document. >>>> >>>> B is a function of only the document, but it doesn't matter, you >can think >>>> of it as a constant if you want. >>>> >>>> The problem is since f(w,D) is almost always zero, I only need to >do the >>>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >>>> f(w,D) is not zero. Is there a clever way to do this with masks? >>>> >>>> You can refactor the above equation to get this: >>>> >>>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in >a >>>> denominator, which is bad (because of dividing by zero). >>>> >>>> So anyway, currently I am converting to a coo_matrix and iterator >through >>>> the non-zero values like this: >>>> >>>> cx = x.tocoo() >>>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>>> (i,j,v) >>>> >>>> >>>> That iterator is incredibly fast, but unfortunately coo_matrix >does >>>> not support assignment. So I create a new copy of either a dok >sparse >>>> matrix or a regular numpy array and assign to that. >>>> >>>> I could also deal directly with the .data, .indptr, and indices >>>> attributes of csr_matrix, and see if it's possible to create a >copy of >>>> .data attribute and update the values accordingly. I was hoping >>>> somebody had encountered this type of issue before. >>>> >>>> Sincerely, >>>> >>>> Basil Beirouti >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: > >>>> >>>> >>>> >>>> Message: 2 >>>> Date: Fri, 01 Jul 2016 17:35:49 -0400 >>>> From: Vlad Niculae >>>> >>>> To: Scikit-learn user and developer mailing list >>>> >>>> Subject: Re: [scikit-learn] Adding BM25 to >>>> scikit-learn.feature_extraction.text >>>> Message-ID: >>>> Content-Type: text/plain; charset="utf-8" >>>> >>>> Hi Basil, >>>> >>>> If B were just a constant, you could do the whole thing as a >vectorized operation on X.data. >>>> >>>> Since I understand B is a n_samples vector, I think the cleanest >way to compute the denominator is using >sklearn.utils.sparsefuncs.inplace_row_scale. >>>> >>>> Hope this helps, >>>> >>>> Vlad >>>> >>>> >>>>> On July 1, 2016 5:17:43 PM EDT, Basil Beirouti > wrote: >>>>> Hi everyone, >>>>> >>>>> to put it succinctly, here's the BM25 equation: >>>>> >>>>> >>>>> f(w,D) * (k+1) / (k*B + f(w,D)) >>>>> >>>>> where w is the word, and D is the document (corresponding to rows >and >>>>> columns, respectively). f is a sparse matrix because only a >fraction of >>>>> the >>>>> whole vocabulary of words appears in any given single document. >>>>> >>>>> B is a function of only the document, but it doesn't matter, you >can >>>>> think >>>>> of it as a constant if you want. >>>>> >>>>> The problem is since f(w,D) is almost always zero, I only need to >do >>>>> the >>>>> calculation (ie. multiply by (k+1) then divide by (k*B + f(w,D))) >when >>>>> f(w,D) is not zero. Is there a clever way to do this with masks? >>>>> >>>>> You can refactor the above equation to get this: >>>>> >>>>> (k+1)/(k*B/f(w,D) + 1) but alas we still have f(w,D) appearing in >a >>>>> denominator, which is bad (because of dividing by zero). >>>>> >>>>> So anyway, currently I am converting to a coo_matrix and iterator >>>>> through >>>>> the non-zero values like this: >>>>> >>>>> >>>>> cx = x.tocoo() >>>>> for i,j,v in itertools.izip(cx.row, cx.col, cx.data): >>>>> (i,j,v) >>>>> >>>>> >>>>> That iterator is incredibly fast, but unfortunately coo_matrix >does >>>>> not support assignment. So I create a new copy of either a dok >sparse >>>>> matrix or a regular numpy array and assign to that. >>>>> >>>>> I could also deal directly with the .data, .indptr, and indices >>>>> attributes of csr_matrix, and see if it's possible to create a >copy of >>>>> .data attribute and update the values accordingly. I was hoping >>>>> somebody had encountered this type of issue before. >>>>> >>>>> Sincerely, >>>>> >>>>> Basil Beirouti >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> -- >>>> Sent from my Android device with K-9 Mail. Please excuse my >brevity. >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: > >>>> >>>> >>>> >>>> Subject: Digest Footer >>>> >>>> >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> >>>> >>>> End of scikit-learn Digest, Vol 4, Issue 3 >>>> ****************************************** >>> >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Sat Jul 2 06:06:21 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Sat, 2 Jul 2016 20:06:21 +1000 Subject: [scikit-learn] Adding BM25 to sklearn.feature_extraction.text In-Reply-To: References: Message-ID: Indeed, the best way to do this with CSR will exploit CSR's internals so that you only need to deal with the matrix elements that are nonzero. Say you have the tf matrix in CSR: doc_len = tf.sum(axis=0) doc_len_term = # compute me bm25 = tf # will operate in-place bm25.data /= (bm25.data + np.repeat(doc_len_term, np.diff(bm25.indptr))) bm25.data *= (k1 + 1) On 2 July 2016 at 03:01, Basil Beirouti wrote: > Hi Joel, > > I'm not by my dev computer right now so I can't show you the code, but the > problem is that the term frequency - f(q,D) in that wiki article - appears > in both the numerator and the denominator. Also, in the denominator, you > must add a scalar quantity to f(q,D), which is unsupported if f(q,D) is > coming from a sparse matrix. > > You can factor the equation in different ways but you can't get around > the main issue that the sparse matrix must appear in the numerator and > denominator. So instead of doing any actual matrix multiplication I just > loop on the non-zero elements in the sparse matrix (term-frequency matrix) > and fill in the new BM25matrix, element by element. > > Any suggestions that use actual sparse matrix operations would be > appreciated. I know I can also loop on the non-zero elements and construct > a sparse csr_matrix from that using the.indptr attribute etc. but I'm > hoping there's a way to use matrix operations. > > Sincerely, > Basil Beirouti > > On Fri, Jul 1, 2016 at 11:00 AM, wrote: > >> Send scikit-learn mailing list submissions to >> scikit-learn at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> or, via email, send a message with subject or body 'help' to >> scikit-learn-request at python.org >> >> You can reach the person managing the list at >> scikit-learn-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. Adding BM25 to sklearn.feature_extraction.text (Update) >> (Basil Beirouti) >> 2. Re: Adding BM25 to sklearn.feature_extraction.text (Update) >> (Joel Nothman) >> 3. Re: Adding BM25 to sklearn.feature_extraction.text (Update) >> (Sebastian Raschka) >> 4. Re: partial_fit implementation for IsolationForest >> (donkey-hotei at cryptolab.net) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 30 Jun 2016 17:23:18 -0500 >> From: Basil Beirouti >> To: scikit-learn at python.org >> Subject: [scikit-learn] Adding BM25 to sklearn.feature_extraction.text >> (Update) >> Message-ID: >> < >> CAB4mTg8tMwoA0NwsfXmVtYWqS547F2NOmP5vj3LTCaNqXjqeWQ at mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> Hello everyone, >> >> I have successfully created a few versions of the BM25Transformer. I >> looked >> at TFIDFTransformer for guidance and I noticed that it outputs a sparse >> matrix when given a sparse termcount matrix as an input. >> >> Unfortunately, the fastest implementation of BM25Transformer that I have >> been able to come up with does NOT output a sparse matrix, it will return >> a >> regular numpy matrix. >> >> Benchmarked against the entire 20newsgroups corpus, here is how they >> perform (assuming input is csr_matrix for all): >> >> 1.) finishes in 4 seconds, outputs a regular numpy matrix >> 2.) finishes in 30 seconds, outputs a dok_matrix >> 3.) finishes in 130 seconds, outputs a regular numpy matrix >> >> It's worth noting that using algorithm 1 and converting the output to a >> sparse matrix still takes less time than 3, and takes about as long as 2. >> >> So my question is, how important is it that my BM25Transformer outputs a >> sparse matrix? >> >> I'm going to try another implementation which looks directly at the data, >> indices, and indptr attributes of the inputted csr_matrix. I just wanted >> to >> check in and see what people thought. >> >> Sincerely, >> Basil Beirouti >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160630/80852326/attachment-0001.html >> > >> >> ------------------------------ >> >> Message: 2 >> Date: Fri, 1 Jul 2016 08:38:15 +1000 >> From: Joel Nothman >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] Adding BM25 to >> sklearn.feature_extraction.text (Update) >> Message-ID: >> < >> CAAkaFLUB+4gu5cHuYYyc8pqBK4Ews4mkXyBKAvMCVENNPUv98Q at mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> I don't see what about BM25, at least as presented at >> https://en.wikipedia.org/wiki/Okapi_BM25, should prevent using CSR >> operations efficiently. Show us your code. >> >> On 1 July 2016 at 08:23, Basil Beirouti wrote: >> >> > Hello everyone, >> > >> > I have successfully created a few versions of the BM25Transformer. I >> > looked at TFIDFTransformer for guidance and I noticed that it outputs a >> > sparse matrix when given a sparse termcount matrix as an input. >> > >> > Unfortunately, the fastest implementation of BM25Transformer that I have >> > been able to come up with does NOT output a sparse matrix, it will >> return a >> > regular numpy matrix. >> > >> > Benchmarked against the entire 20newsgroups corpus, here is how they >> > perform (assuming input is csr_matrix for all): >> > >> > 1.) finishes in 4 seconds, outputs a regular numpy matrix >> > 2.) finishes in 30 seconds, outputs a dok_matrix >> > 3.) finishes in 130 seconds, outputs a regular numpy matrix >> > >> > It's worth noting that using algorithm 1 and converting the output to a >> > sparse matrix still takes less time than 3, and takes about as long as >> 2. >> > >> > So my question is, how important is it that my BM25Transformer outputs a >> > sparse matrix? >> > >> > I'm going to try another implementation which looks directly at the >> data, >> > indices, and indptr attributes of the inputted csr_matrix. I just >> wanted to >> > check in and see what people thought. >> > >> > Sincerely, >> > Basil Beirouti >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160701/de28d786/attachment-0001.html >> > >> >> ------------------------------ >> >> Message: 3 >> Date: Thu, 30 Jun 2016 18:33:49 -0400 >> From: Sebastian Raschka >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] Adding BM25 to >> sklearn.feature_extraction.text (Update) >> Message-ID: >> <6411ECB7-BD7C-4960-B847-B3D633DD848A at sebastianraschka.com> >> Content-Type: text/plain; charset=utf-8 >> >> Hi, Basil, >> >> I?d say runtime may not be the main concern regarding sparse vs. dense. >> In my opinion, the main reason to use sparse arrays would be memory useage. >> I.e., text data is typically rather large (esp. high-dimensional, sparse >> feature vector). So one limitation with scikit-learn is typically memory >> capacity, especially if you are using multiprocessing via the cv param. >> >> PS: >> >> > regular numpy matrix >> >> I think you mean "numpy array?? (Since there?s a numpy matrix datastruct >> in numpy as well, however, almost no one uses it) >> >> Best, >> Sebastian >> >> > On Jun 30, 2016, at 6:23 PM, Basil Beirouti >> wrote: >> > >> > Hello everyone, >> > >> > I have successfully created a few versions of the BM25Transformer. I >> looked at TFIDFTransformer for guidance and I noticed that it outputs a >> sparse matrix when given a sparse termcount matrix as an input. >> > >> > Unfortunately, the fastest implementation of BM25Transformer that I >> have been able to come up with does NOT output a sparse matrix, it will >> return a regular numpy matrix. >> > >> > Benchmarked against the entire 20newsgroups corpus, here is how they >> perform (assuming input is csr_matrix for all): >> > >> > 1.) finishes in 4 seconds, outputs a regular numpy matrix >> > 2.) finishes in 30 seconds, outputs a dok_matrix >> > 3.) finishes in 130 seconds, outputs a regular numpy matrix >> > >> > It's worth noting that using algorithm 1 and converting the output to a >> sparse matrix still takes less time than 3, and takes about as long as 2. >> > >> > So my question is, how important is it that my BM25Transformer outputs >> a sparse matrix? >> > >> > I'm going to try another implementation which looks directly at the >> data, indices, and indptr attributes of the inputted csr_matrix. I just >> wanted to check in and see what people thought. >> > >> > Sincerely, >> > Basil Beirouti >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> ------------------------------ >> >> Message: 4 >> Date: Fri, 01 Jul 2016 09:48:55 +0200 >> From: donkey-hotei at cryptolab.net >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] partial_fit implementation for >> IsolationForest >> Message-ID: <89984291c81434d8be46cdc4c4527b44 at cryptolab.net> >> Content-Type: text/plain; charset=US-ASCII; format=flowed >> >> hi Olivier, >> >> thanks for your response. >> >> > What you describe is quite different from what sklearn models >> > typically do with partial_fit. partial_fit is more about out-of-core / >> > streaming fitting rather than true online learning with explicit >> > forgetting. >> > >> > In particular what you suggest would not accept calling partial_fit >> > with very small chunks (e.g. from tens to a hundred samples at a time) >> > because that would not be enough to develop deep isolation trees and >> > would harm the performance of the resulting isolation forest. >> >> I see, suppose I should check to see how the depth of these trees >> changes when fitting on small chunks as opposed to large chunks -. >> either way, refreshing on at least 1000 samples has proven to work O.K >> here in the face of concept drift >> >> > If the problem is true online learning (tracking a stream of training >> > data with expected shifts in its distribution) I think it's better to >> > devise a dedicated API that does not try to mimic the scikit-learn API >> > (for this specific part). There will typically have to be an >> > additional hyperparameter to control how much the model should >> > remember about old samples. >> >> ok, i've been using a parameter called 'n_more_estimators' that decides >> how many trees are dropped/added. maybe it is not the best way >> >> > If the problem is more about out-of-core, then partial_fit is suitable >> > but the trees should grow and get reorganized progressively (as >> > pointed by others in previous comments). >> >> maybe a name like "online_fit" would be more appropriate? it would be >> nice to know what exactly is meant by "reorganized" , so far ive been >> merely dropping the oldest trees >> >> > BTW, I would be curious to know more about the kind of anomaly >> > detection problem where you found IsolationForests to work well. >> >> The problem is intrusion detection at the application layer, features >> are parsed from http audit logs >> >> ty >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 4, Issue 1 >> ****************************************** >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deshpande.jaidev at gmail.com Sat Jul 2 09:40:47 2016 From: deshpande.jaidev at gmail.com (Jaidev Deshpande) Date: Sat, 02 Jul 2016 13:40:47 +0000 Subject: [scikit-learn] Using fit_intercept with sparse matrices Message-ID: Hi, I usually encounter many cases when I've forgotten that my input to the `AnyEstimator.fit` method is a sparse matrix, and I've set `fit_intercept=False`. To avoid this, I could of course make a habit of not tampering with the default `fit_intercept=True`, but I think it would be better and more idiot-proof if the method raises a warning saying something like, "fit_intercept=False works best when the data is centered, and sparse matrices can't be centered. Please set fit_intercept=True for better results." What do you think? If doing this is worthwhile, I'll send a PR. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Sat Jul 2 09:46:50 2016 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Sat, 2 Jul 2016 15:46:50 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: note: the Lasso and ElasticNet code do fit the intercept without breaking sparsity. Alex From tom.duprelatour at orange.fr Mon Jul 4 06:00:51 2016 From: tom.duprelatour at orange.fr (Tom DLT) Date: Mon, 4 Jul 2016 12:00:51 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: note2: The LogisticRegression and Ridge(solver='sag') code do fit the intercept without breaking sparsity. For other solvers in Ridge, in the case of a sparse X input, the solver will automatically be changed to 'sag' and raise a warning. Tom 2016-07-04 7:24 GMT+02:00 Tom Dupr? la Tour : > note2: > > The LogisticRegression and Ridge(solver='sag') code do fit the intercept > without breaking sparsity. > > For other solvers in Ridge, in the case of a sparse X input, the solver > will automatically be changed to 'sag' and raise a warning. > Le 2 juil. 2016 15:48, "Alexandre Gramfort" < > alexandre.gramfort at telecom-paristech.fr> a ?crit : > >> note: >> >> the Lasso and ElasticNet code do fit the intercept without breaking >> sparsity. >> >> Alex >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deshpande.jaidev at gmail.com Mon Jul 4 06:13:29 2016 From: deshpande.jaidev at gmail.com (Jaidev Deshpande) Date: Mon, 04 Jul 2016 10:13:29 +0000 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: On Mon, 4 Jul 2016 at 15:33 Tom DLT wrote: > note2: > > The LogisticRegression and Ridge(solver='sag') code do fit the intercept > without breaking sparsity. > > For other solvers in Ridge, in the case of a sparse X input, the solver > will automatically be changed to 'sag' and raise a warning. > > Tom > > 2016-07-04 7:24 GMT+02:00 Tom Dupr? la Tour > : > >> note2: >> >> The LogisticRegression and Ridge(solver='sag') code do fit the intercept >> without breaking sparsity. >> >> For other solvers in Ridge, in the case of a sparse X input, the solver >> will automatically be changed to 'sag' and raise a warning. >> > Thanks. I understand that these estimators can fit the intercept without breaking the sparsity. My point was, would it not be useful to raise a warning when the input is sparse and the user does _not_ want to fit the intercept? > Le 2 juil. 2016 15:48, "Alexandre Gramfort" < >> alexandre.gramfort at telecom-paristech.fr> a ?crit : >> >>> note: >>> >>> the Lasso and ElasticNet code do fit the intercept without breaking >>> sparsity. >>> >>> Alex >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Mon Jul 4 08:11:39 2016 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Mon, 4 Jul 2016 14:11:39 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: On Mon, Jul 4, 2016 at 12:13 PM, Jaidev Deshpande wrote: > My point was, would it not be useful to raise a warning when the input is > sparse and the user does _not_ want to fit the intercept? I don't get it. Just fit_intercept=False should do it. why a warning??? A From luizfgoncalves at dcc.ufmg.br Mon Jul 4 16:09:20 2016 From: luizfgoncalves at dcc.ufmg.br (luizfgoncalves at dcc.ufmg.br) Date: Mon, 4 Jul 2016 17:09:20 -0300 Subject: [scikit-learn] Create a "Feature_Weight" Parameter at RandomForestRegressor Message-ID: <0538c1d0a140e3c1bb7b83573baf5ea8.squirrel@webmail.dcc.ufmg.br> I would like to give different weights to the features in the feature set for the split task of Random Forest. Right now, only the MSE metric is used to select the best split, and I want to do something like feature[i] = MSI[i] * feature_weight[i]. This way, I'll be able to give more importance to the features I already know that are better. In my mind, this change would be called on the fit function, something like this: def fit(self, X, y, sample_weight, feature_weight): And the feature_weight would be a vector with customized weights for all features present in the dataset. What is the best way to do that? I'm having a really hard time figuring out how to do this changes on the code. Thanks a lot for your attention. Luiz Felipe From joel.nothman at gmail.com Mon Jul 4 18:49:36 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 5 Jul 2016 08:49:36 +1000 Subject: [scikit-learn] 0.18? Message-ID: Has there been talk about a release? We've long-since merged the big changes to CV. Among the things seeming unfinished there is that where `cv` is available, `fit` should also support a `labels` parameter. That's not available in RFECV etc. There are some other nice features in the next release too, and I think we should move towards putting them out there. Should we start tagging issues in and out of the 0.18 milestone? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Tue Jul 5 02:30:07 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 5 Jul 2016 16:30:07 +1000 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: Jaidev is suggesting that fit_intercept=False makes no sense if the data is sparse. But I think that depends on your target variable. On 4 July 2016 at 22:11, Alexandre Gramfort < alexandre.gramfort at telecom-paristech.fr> wrote: > On Mon, Jul 4, 2016 at 12:13 PM, Jaidev Deshpande > wrote: > > My point was, would it not be useful to raise a warning when the input is > > sparse and the user does _not_ want to fit the intercept? > > I don't get it. Just fit_intercept=False should do it. why a warning??? > > A > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.eickenberg at gmail.com Tue Jul 5 02:39:16 2016 From: michael.eickenberg at gmail.com (Michael Eickenberg) Date: Tue, 5 Jul 2016 08:39:16 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: On Tuesday, July 5, 2016, Joel Nothman wrote: > Jaidev is suggesting that fit_intercept=False makes no sense if the data > is sparse. > +1 > But I think that depends on your target variable. > +1 > > > > On 4 July 2016 at 22:11, Alexandre Gramfort < > alexandre.gramfort at telecom-paristech.fr > > > wrote: > >> On Mon, Jul 4, 2016 at 12:13 PM, Jaidev Deshpande >> > > wrote: >> > My point was, would it not be useful to raise a warning when the input >> is >> > sparse and the user does _not_ want to fit the intercept? >> >> I don't get it. Just fit_intercept=False should do it. why a warning??? >> >> A >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Tue Jul 5 03:08:53 2016 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Tue, 5 Jul 2016 09:08:53 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: > Jaidev is suggesting that fit_intercept=False makes no sense if the data is > sparse. But I think that depends on your target variable. It can make sense **not** to fit intercept e.g. if it has no impact on perf it is faster to optimize without one From gael.varoquaux at normalesup.org Tue Jul 5 03:46:48 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 5 Jul 2016 09:46:48 +0200 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: References: Message-ID: <20160705074648.GD3180701@phare.normalesup.org> > > Jaidev is suggesting that fit_intercept=False makes no sense if the > > data is sparse. But I think that depends on your target variable. > It can make sense **not** to fit intercept e.g. if it has no impact on > perf it is faster to optimize without one +1 From zephyr14 at gmail.com Tue Jul 5 09:52:09 2016 From: zephyr14 at gmail.com (Vlad Niculae) Date: Tue, 5 Jul 2016 09:52:09 -0400 Subject: [scikit-learn] Using fit_intercept with sparse matrices In-Reply-To: <20160705074648.GD3180701@phare.normalesup.org> References: <20160705074648.GD3180701@phare.normalesup.org> Message-ID: For example I use fit_intercept=False when training SVMRank-style models where inputs are pairwise differences (x_i - x_j), I[y_i > y_j]. In this setting it's actually incorrect to learn an intercept. On Tue, Jul 5, 2016 at 3:46 AM, Gael Varoquaux wrote: >> > Jaidev is suggesting that fit_intercept=False makes no sense if the >> > data is sparse. But I think that depends on your target variable. > >> It can make sense **not** to fit intercept e.g. if it has no impact on >> perf it is faster to optimize without one > > +1 > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Wed Jul 6 14:26:09 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 6 Jul 2016 14:26:09 -0400 Subject: [scikit-learn] 0.18? In-Reply-To: References: Message-ID: <577D4D41.60102@gmail.com> Hi Joel. I totally agree. I've been still busy with the book and haven't caught up with all the developments yet. There is probably a whole bunch of bug fixes that we need to do, though, and maybe some other API changes / deprecations. I'll be more helpful in two weeks, when my book is submitted and scipy is over. Best, Andy On 07/04/2016 06:49 PM, Joel Nothman wrote: > Has there been talk about a release? > > We've long-since merged the big changes to CV. Among the things > seeming unfinished there is that where `cv` is available, `fit` should > also support a `labels` parameter. That's not available in RFECV etc. > > There are some other nice features in the next release too, and I > think we should move towards putting them out there. Should we start > tagging issues in and out of the 0.18 milestone? > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Wed Jul 6 18:05:26 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 7 Jul 2016 08:05:26 +1000 Subject: [scikit-learn] 0.18? In-Reply-To: <577D4D41.60102@gmail.com> References: <577D4D41.60102@gmail.com> Message-ID: I knew the book was holding things up, but I thought I'd check in. On 7 July 2016 at 04:26, Andreas Mueller wrote: > Hi Joel. > I totally agree. > I've been still busy with the book and haven't caught up with all the > developments yet. > There is probably a whole bunch of bug fixes that we need to do, though, > and maybe some other API changes / deprecations. > > I'll be more helpful in two weeks, when my book is submitted and scipy is > over. > > Best, > Andy > > > On 07/04/2016 06:49 PM, Joel Nothman wrote: > > Has there been talk about a release? > > We've long-since merged the big changes to CV. Among the things seeming > unfinished there is that where `cv` is available, `fit` should also support > a `labels` parameter. That's not available in RFECV etc. > > There are some other nice features in the next release too, and I think we > should move towards putting them out there. Should we start tagging issues > in and out of the 0.18 milestone? > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From o.lyashevskaya at gmail.com Thu Jul 7 04:56:04 2016 From: o.lyashevskaya at gmail.com (Olga Lyashevska) Date: Thu, 7 Jul 2016 09:56:04 +0100 Subject: [scikit-learn] GradientBoostingRegressor with training, validation, and test set. Message-ID: <577E1924.70506@gmail.com> Hi, I implement GradientBoostingRegressor algorithm. I randomly divide the dataset into three parts: a training set (50%), a validation set (25%), and a test set (25%). I understand that the training set is used for model fitting (1); the validation set is used for estimation of prediction error for model selection (2); and, finally, the test set is used for assessment of the final chosen model (3). However, I am not sure how to implement this. Can anyone give any examples? Many thanks, Olga X_train, X_test, y_train, y_test = cv.train_test_split(X, y, test_size=0.5) X_test, X_val, y_test, y_val = cv.train_test_split(X_test, y_test, test_size=0.5) params = {'n_estimators': 2000, 'max_depth': 4, 'min_samples_leaf': 4, 'learning_rate': 0.01, 'min_samples_split': 1, 'subsample': 0.75, 'random_state': 42, 'loss': 'ls'} est = ensemble.GradientBoostingRegressor(**params) est.fit(X_train, y_train) # 1 mean_squared_error(y_test, est.predict(X_test)) # 3 From olivier.grisel at ensta.org Thu Jul 7 08:35:30 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 7 Jul 2016 14:35:30 +0200 Subject: [scikit-learn] GradientBoostingRegressor with training, validation, and test set. In-Reply-To: <577E1924.70506@gmail.com> References: <577E1924.70506@gmail.com> Message-ID: It means that in your script you should print the score on the validation set instead of the test set. Then you are allowed to tweak the values in your params dict to see if you can find values that improve that score. Once you are confident that you can no longer improve the validation score via parameter tweaking (or feature engineering) you can evaluate your best model on the final test set (only once). It can be the case that the final test score is a bit worse than the validation score. If that's the case you should trust the test score as the most realistic evaluation of the true generalization performance of your final model. You might also be interested in implementing early stopping with warm started models to adjust the value of n_estimators. For instance see (towards the last third of the notebook): https://github.com/ogrisel/notebooks/blob/master/sklearn_demos/Gradient%20Boosting.ipynb -- Olivier From o.lyashevskaya at gmail.com Fri Jul 8 04:44:08 2016 From: o.lyashevskaya at gmail.com (Olga Lyashevska) Date: Fri, 8 Jul 2016 09:44:08 +0100 Subject: [scikit-learn] GradientBoostingRegressor with training, validation, and test set. In-Reply-To: References: <577E1924.70506@gmail.com> Message-ID: <577F67D8.4020605@gmail.com> Thanks for your help, Olivier. I am doing parameter tweaking manually, right? Or can I implement GridSearchCV? Then I wonder how I could specify that 'scoring' should be done for validation set and not training. Many thanks, Olga From olivier.grisel at ensta.org Fri Jul 8 07:49:49 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 8 Jul 2016 13:49:49 +0200 Subject: [scikit-learn] GradientBoostingRegressor with training, validation, and test set. In-Reply-To: <577F67D8.4020605@gmail.com> References: <577E1924.70506@gmail.com> <577F67D8.4020605@gmail.com> Message-ID: GridSearchCV will automatically generate the validation sets internally (this is where the "CV" comes from). So you don't have to generate a validation set if you decide to use GridSearchCV to select the best model. More details here: http://scikit-learn.org/stable/model_selection.html -- Olivier From o.lyashevskaya at gmail.com Fri Jul 8 10:02:54 2016 From: o.lyashevskaya at gmail.com (Olga Lyashevska) Date: Fri, 8 Jul 2016 15:02:54 +0100 Subject: [scikit-learn] GradientBoostingRegressor with training, validation, and test set. In-Reply-To: References: <577E1924.70506@gmail.com> <577F67D8.4020605@gmail.com> Message-ID: <577FB28E.5030706@gmail.com> Thanks for this clarification. Cheers, Olga From mmmnow at gmail.com Fri Jul 8 11:22:05 2016 From: mmmnow at gmail.com (=?UTF-8?Q?Micha=C5=82_Nowotka?=) Date: Fri, 8 Jul 2016 16:22:05 +0100 Subject: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample Message-ID: Hi, Sorry for cross posting (http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample) but I don't know where is better to get help with my problem. I'm working on a VM with Jupyter notebook server installed. >From time to time I add new notebooks and reevaluate old ones to see if they still work. This notebook stopped working due to some changes in scikit-learn API and some parameters become obsolete: https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb I've created a corrected version of the notebook here: https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 But I'm stuck in cell 36 on this code: from sklearn.cross_validation import KFold from sklearn.grid_search import GridSearchCV X_traina, X_testa, y_traina, y_testa = cross_validation.train_test_split(x, y, test_size=0.95, random_state=23) params = {'min_samples_split': [8], 'max_depth': [20], 'min_samples_leaf': [1],'n_estimators':[200]} cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) cv_stratified = StratifiedKFold(y_traina, n_folds=5) gs = GridSearchCV(custom_forest, params, cv=cv_stratified,verbose=1,refit=True) gs.fit(X_traina,y_traina) This gives me: ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a minimum of 1 is required. Now I don't understand this because when I print shapes of the samples: print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) I'm getting: ((78, 491), (1489, 491), (78,), (1489,)) Interestingly, if I change the test_size parameter to 0.88 (like in the example corrected notebook) it works and this is the highest value where it works. For this value, the shapes are: ((188, 491), (1379, 491), (188,), (1379,)) So the question is - what should I change in my code to make it work for test_size set to 0.95 as well? Kind regards, Michal Nowotka From t3kcit at gmail.com Fri Jul 8 17:00:42 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 8 Jul 2016 17:00:42 -0400 Subject: [scikit-learn] Create a "Feature_Weight" Parameter at RandomForestRegressor In-Reply-To: <0538c1d0a140e3c1bb7b83573baf5ea8.squirrel@webmail.dcc.ufmg.br> References: <0538c1d0a140e3c1bb7b83573baf5ea8.squirrel@webmail.dcc.ufmg.br> Message-ID: <5780147A.5040901@gmail.com> You would need to implement a custom splitter, I think. On 07/04/2016 04:09 PM, luizfgoncalves at dcc.ufmg.br wrote: > I would like to give different weights to the features in the feature set > for the split task of Random Forest. Right now, only the MSE metric is > used to select the best split, and I want to do something like feature[i] > = MSI[i] * feature_weight[i]. This way, I'll be able to give more > importance to the features I already know that are better. > > In my mind, this change would be called on the fit function, something > like this: def fit(self, X, y, sample_weight, feature_weight): > And the feature_weight would be a vector with customized weights for all > features present in the dataset. > > What is the best way to do that? I'm having a really hard time figuring > out how to do this changes on the code. > Thanks a lot for your attention. > > Luiz Felipe > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From maciek at wojcikowski.pl Fri Jul 8 17:42:06 2016 From: maciek at wojcikowski.pl (=?UTF-8?Q?Maciek_W=C3=B3jcikowski?=) Date: Fri, 8 Jul 2016 23:42:06 +0200 Subject: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample In-Reply-To: References: Message-ID: Hi Micha?, What are the class counts in that set? Maybe there is a problem with generating stratified subsamples (eg some classes get below 1 sample)? ---- Pozdrawiam, | Best regards, Maciek W?jcikowski maciek at wojcikowski.pl 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : > Hi, > > Sorry for cross posting > ( > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > ) > but I don't know where is better to get help with my problem. > I'm working on a VM with Jupyter notebook server installed. > From time to time I add new notebooks and reevaluate old ones to see > if they still work. > > This notebook stopped working due to some changes in scikit-learn API > and some parameters become obsolete: > > > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > > I've created a corrected version of the notebook here: > > https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > > But I'm stuck in cell 36 on this code: > > from sklearn.cross_validation import KFold > from sklearn.grid_search import GridSearchCV > > X_traina, X_testa, y_traina, y_testa = > cross_validation.train_test_split(x, y, test_size=0.95, > random_state=23) > > params = {'min_samples_split': [8], 'max_depth': [20], > 'min_samples_leaf': [1],'n_estimators':[200]} > cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > cv_stratified = StratifiedKFold(y_traina, n_folds=5) > gs = GridSearchCV(custom_forest, params, > cv=cv_stratified,verbose=1,refit=True) > gs.fit(X_traina,y_traina) > > This gives me: > > ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > minimum of 1 is required. > > Now I don't understand this because when I print shapes of the samples: > > print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > > I'm getting: > > ((78, 491), (1489, 491), (78,), (1489,)) > > Interestingly, if I change the test_size parameter to 0.88 (like in > the example corrected notebook) it works and this is the highest value > where it works. For this value, the shapes are: > > ((188, 491), (1379, 491), (188,), (1379,)) > > So the question is - what should I change in my code to make it work > for test_size set to 0.95 as well? > > Kind regards, > > Michal Nowotka > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From basilbeirouti at gmail.com Sun Jul 10 17:44:30 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Sun, 10 Jul 2016 16:44:30 -0500 Subject: [scikit-learn] Added BM25Transformer and BM25Vectorizer to sklearn.feature_extraction.text Message-ID: Hi all, I have submitted a pull request to the main branch. I added BM25Transformer and BM25Vectorizer, which are very similar to TFIDFTransformer and TFIDFVectorizer, except they implement the BM25 algorithm instead. Would really appreciate feedback on the quality of my work and how I can improve. Sincerely, Basil Beirouti On Sat, Jul 9, 2016 at 11:00 AM, wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Create a "Feature_Weight" Parameter at > RandomForestRegressor (Andreas Mueller) > 2. Re: Scikit learn GridSearchCV fit method ValueError Found > array with 0 sample (Maciek W?jcikowski) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 8 Jul 2016 17:00:42 -0400 > From: Andreas Mueller > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Create a "Feature_Weight" Parameter at > RandomForestRegressor > Message-ID: <5780147A.5040901 at gmail.com> > Content-Type: text/plain; charset=windows-1252; format=flowed > > You would need to implement a custom splitter, I think. > > On 07/04/2016 04:09 PM, luizfgoncalves at dcc.ufmg.br wrote: > > I would like to give different weights to the features in the feature set > > for the split task of Random Forest. Right now, only the MSE metric is > > used to select the best split, and I want to do something like feature[i] > > = MSI[i] * feature_weight[i]. This way, I'll be able to give more > > importance to the features I already know that are better. > > > > In my mind, this change would be called on the fit function, something > > like this: def fit(self, X, y, sample_weight, feature_weight): > > And the feature_weight would be a vector with customized weights for all > > features present in the dataset. > > > > What is the best way to do that? I'm having a really hard time figuring > > out how to do this changes on the code. > > Thanks a lot for your attention. > > > > Luiz Felipe > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Message: 2 > Date: Fri, 8 Jul 2016 23:42:06 +0200 > From: Maciek W?jcikowski > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Scikit learn GridSearchCV fit method > ValueError Found array with 0 sample > Message-ID: > < > CAH2JJR35CFDJPqTNFn7+uSCVKUVJEPM9mjYDwLTgkipLeWcVCw at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Micha?, > > What are the class counts in that set? Maybe there is a problem with > generating stratified subsamples (eg some classes get below 1 sample)? > > ---- > Pozdrawiam, | Best regards, > Maciek W?jcikowski > maciek at wojcikowski.pl > > 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : > > > Hi, > > > > Sorry for cross posting > > ( > > > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > > ) > > but I don't know where is better to get help with my problem. > > I'm working on a VM with Jupyter notebook server installed. > > From time to time I add new notebooks and reevaluate old ones to see > > if they still work. > > > > This notebook stopped working due to some changes in scikit-learn API > > and some parameters become obsolete: > > > > > > > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > > > > I've created a corrected version of the notebook here: > > > > https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > > > > But I'm stuck in cell 36 on this code: > > > > from sklearn.cross_validation import KFold > > from sklearn.grid_search import GridSearchCV > > > > X_traina, X_testa, y_traina, y_testa = > > cross_validation.train_test_split(x, y, test_size=0.95, > > random_state=23) > > > > params = {'min_samples_split': [8], 'max_depth': [20], > > 'min_samples_leaf': [1],'n_estimators':[200]} > > cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > > cv_stratified = StratifiedKFold(y_traina, n_folds=5) > > gs = GridSearchCV(custom_forest, params, > > cv=cv_stratified,verbose=1,refit=True) > > gs.fit(X_traina,y_traina) > > > > This gives me: > > > > ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > > minimum of 1 is required. > > > > Now I don't understand this because when I print shapes of the samples: > > > > print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > > > > I'm getting: > > > > ((78, 491), (1489, 491), (78,), (1489,)) > > > > Interestingly, if I change the test_size parameter to 0.88 (like in > > the example corrected notebook) it works and this is the highest value > > where it works. For this value, the shapes are: > > > > ((188, 491), (1379, 491), (188,), (1379,)) > > > > So the question is - what should I change in my code to make it work > > for test_size set to 0.95 as well? > > > > Kind regards, > > > > Michal Nowotka > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160708/0ce8659a/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 13 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmmnow at gmail.com Mon Jul 11 07:16:05 2016 From: mmmnow at gmail.com (=?UTF-8?Q?Micha=C5=82_Nowotka?=) Date: Mon, 11 Jul 2016 12:16:05 +0100 Subject: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample In-Reply-To: References: Message-ID: Hi Maciek, Thanks for suggestion, I think the problem indeed is related to the StratifiedKFold because if I use KFold instead the code works fine. However, if I print StratifiedKFold object it looks fine to me: sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, random_state=None) On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski wrote: > Hi Micha?, > > What are the class counts in that set? Maybe there is a problem with > generating stratified subsamples (eg some classes get below 1 sample)? > > ---- > Pozdrawiam, | Best regards, > Maciek W?jcikowski > maciek at wojcikowski.pl > > 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : >> >> Hi, >> >> Sorry for cross posting >> >> (http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample) >> but I don't know where is better to get help with my problem. >> I'm working on a VM with Jupyter notebook server installed. >> From time to time I add new notebooks and reevaluate old ones to see >> if they still work. >> >> This notebook stopped working due to some changes in scikit-learn API >> and some parameters become obsolete: >> >> >> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb >> >> I've created a corrected version of the notebook here: >> >> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 >> >> But I'm stuck in cell 36 on this code: >> >> from sklearn.cross_validation import KFold >> from sklearn.grid_search import GridSearchCV >> >> X_traina, X_testa, y_traina, y_testa = >> cross_validation.train_test_split(x, y, test_size=0.95, >> random_state=23) >> >> params = {'min_samples_split': [8], 'max_depth': [20], >> 'min_samples_leaf': [1],'n_estimators':[200]} >> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) >> cv_stratified = StratifiedKFold(y_traina, n_folds=5) >> gs = GridSearchCV(custom_forest, params, >> cv=cv_stratified,verbose=1,refit=True) >> gs.fit(X_traina,y_traina) >> >> This gives me: >> >> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a >> minimum of 1 is required. >> >> Now I don't understand this because when I print shapes of the samples: >> >> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) >> >> I'm getting: >> >> ((78, 491), (1489, 491), (78,), (1489,)) >> >> Interestingly, if I change the test_size parameter to 0.88 (like in >> the example corrected notebook) it works and this is the highest value >> where it works. For this value, the shapes are: >> >> ((188, 491), (1379, 491), (188,), (1379,)) >> >> So the question is - what should I change in my code to make it work >> for test_size set to 0.95 as well? >> >> Kind regards, >> >> Michal Nowotka >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From maciek at wojcikowski.pl Mon Jul 11 07:33:28 2016 From: maciek at wojcikowski.pl (=?UTF-8?Q?Maciek_W=C3=B3jcikowski?=) Date: Mon, 11 Jul 2016 13:33:28 +0200 Subject: [scikit-learn] Scikit learn GridSearchCV fit method ValueError Found array with 0 sample In-Reply-To: References: Message-ID: Shouldn't you pass labels (binary) instead of continuous data? If you wish to stick to logK's and keep the distribution unchanged then you'd better reduce the number of classes (eg round the values to nearest integer?). It might be the case that the counts per class are floored and you get 0 for some cases. ---- Pozdrawiam, | Best regards, Maciek W?jcikowski maciek at wojcikowski.pl 2016-07-11 13:16 GMT+02:00 Micha? Nowotka : > Hi Maciek, > > Thanks for suggestion, I think the problem indeed is related to the > StratifiedKFold because if I use KFold instead the code works fine. > However, if I print StratifiedKFold object it looks fine to me: > > sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 > 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 > 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 > 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 > 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 > 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 > 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 > 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, > random_state=None) > > > On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski > wrote: > > Hi Micha?, > > > > What are the class counts in that set? Maybe there is a problem with > > generating stratified subsamples (eg some classes get below 1 sample)? > > > > ---- > > Pozdrawiam, | Best regards, > > Maciek W?jcikowski > > maciek at wojcikowski.pl > > > > 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : > >> > >> Hi, > >> > >> Sorry for cross posting > >> > >> ( > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > ) > >> but I don't know where is better to get help with my problem. > >> I'm working on a VM with Jupyter notebook server installed. > >> From time to time I add new notebooks and reevaluate old ones to see > >> if they still work. > >> > >> This notebook stopped working due to some changes in scikit-learn API > >> and some parameters become obsolete: > >> > >> > >> > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > >> > >> I've created a corrected version of the notebook here: > >> > >> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > >> > >> But I'm stuck in cell 36 on this code: > >> > >> from sklearn.cross_validation import KFold > >> from sklearn.grid_search import GridSearchCV > >> > >> X_traina, X_testa, y_traina, y_testa = > >> cross_validation.train_test_split(x, y, test_size=0.95, > >> random_state=23) > >> > >> params = {'min_samples_split': [8], 'max_depth': [20], > >> 'min_samples_leaf': [1],'n_estimators':[200]} > >> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > >> cv_stratified = StratifiedKFold(y_traina, n_folds=5) > >> gs = GridSearchCV(custom_forest, params, > >> cv=cv_stratified,verbose=1,refit=True) > >> gs.fit(X_traina,y_traina) > >> > >> This gives me: > >> > >> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > >> minimum of 1 is required. > >> > >> Now I don't understand this because when I print shapes of the samples: > >> > >> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > >> > >> I'm getting: > >> > >> ((78, 491), (1489, 491), (78,), (1489,)) > >> > >> Interestingly, if I change the test_size parameter to 0.88 (like in > >> the example corrected notebook) it works and this is the highest value > >> where it works. For this value, the shapes are: > >> > >> ((188, 491), (1379, 491), (188,), (1379,)) > >> > >> So the question is - what should I change in my code to make it work > >> for test_size set to 0.95 as well? > >> > >> Kind regards, > >> > >> Michal Nowotka > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From basilbeirouti at gmail.com Mon Jul 11 18:11:18 2016 From: basilbeirouti at gmail.com (Basil Beirouti) Date: Mon, 11 Jul 2016 17:11:18 -0500 Subject: [scikit-learn] Bm25 pull request Message-ID: Hi, Joel thanks for pointing out the indentation issue. I have fixed it. Can someone explain what the 3 tests that were automatically run on my code are? And why did the Appveyor and Travis ones fail? Sincerely, Basil Beirouti Sent from my iPhone > On Jul 11, 2016, at 11:00 AM, scikit-learn-request at python.org wrote: > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Scikit learn GridSearchCV fit method ValueError Found > array with 0 sample (Maciek W?jcikowski) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 11 Jul 2016 13:33:28 +0200 > From: Maciek W?jcikowski > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Scikit learn GridSearchCV fit method > ValueError Found array with 0 sample > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Shouldn't you pass labels (binary) instead of continuous data? If you wish > to stick to logK's and keep the distribution unchanged then you'd better > reduce the number of classes (eg round the values to nearest integer?). > > It might be the case that the counts per class are floored and you get 0 > for some cases. > > ---- > Pozdrawiam, | Best regards, > Maciek W?jcikowski > maciek at wojcikowski.pl > > 2016-07-11 13:16 GMT+02:00 Micha? Nowotka : > >> Hi Maciek, >> >> Thanks for suggestion, I think the problem indeed is related to the >> StratifiedKFold because if I use KFold instead the code works fine. >> However, if I print StratifiedKFold object it looks fine to me: >> >> sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 >> 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 >> 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 >> 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 >> 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 >> 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 >> 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 >> 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, >> random_state=None) >> >> >> On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski >> wrote: >>> Hi Micha?, >>> >>> What are the class counts in that set? Maybe there is a problem with >>> generating stratified subsamples (eg some classes get below 1 sample)? >>> >>> ---- >>> Pozdrawiam, | Best regards, >>> Maciek W?jcikowski >>> maciek at wojcikowski.pl >>> >>> 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : >>>> >>>> Hi, >>>> >>>> Sorry for cross posting >>>> >>>> ( >> http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample >> ) >>>> but I don't know where is better to get help with my problem. >>>> I'm working on a VM with Jupyter notebook server installed. >>>> From time to time I add new notebooks and reevaluate old ones to see >>>> if they still work. >>>> >>>> This notebook stopped working due to some changes in scikit-learn API >>>> and some parameters become obsolete: >> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb >>>> >>>> I've created a corrected version of the notebook here: >>>> >>>> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 >>>> >>>> But I'm stuck in cell 36 on this code: >>>> >>>> from sklearn.cross_validation import KFold >>>> from sklearn.grid_search import GridSearchCV >>>> >>>> X_traina, X_testa, y_traina, y_testa = >>>> cross_validation.train_test_split(x, y, test_size=0.95, >>>> random_state=23) >>>> >>>> params = {'min_samples_split': [8], 'max_depth': [20], >>>> 'min_samples_leaf': [1],'n_estimators':[200]} >>>> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) >>>> cv_stratified = StratifiedKFold(y_traina, n_folds=5) >>>> gs = GridSearchCV(custom_forest, params, >>>> cv=cv_stratified,verbose=1,refit=True) >>>> gs.fit(X_traina,y_traina) >>>> >>>> This gives me: >>>> >>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a >>>> minimum of 1 is required. >>>> >>>> Now I don't understand this because when I print shapes of the samples: >>>> >>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) >>>> >>>> I'm getting: >>>> >>>> ((78, 491), (1489, 491), (78,), (1489,)) >>>> >>>> Interestingly, if I change the test_size parameter to 0.88 (like in >>>> the example corrected notebook) it works and this is the highest value >>>> where it works. For this value, the shapes are: >>>> >>>> ((188, 491), (1379, 491), (188,), (1379,)) >>>> >>>> So the question is - what should I change in my code to make it work >>>> for test_size set to 0.95 as well? >>>> >>>> Kind regards, >>>> >>>> Michal Nowotka >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 15 > ******************************************* From joel.nothman at gmail.com Mon Jul 11 20:26:54 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Tue, 12 Jul 2016 10:26:54 +1000 Subject: [scikit-learn] Bm25 pull request In-Reply-To: References: Message-ID: CircleCI checks the documentation build (although apparently it ignores changes only to docstrings). Travis runs all tests on a linux system. AppVeyor tests on Windows. On 12 July 2016 at 08:11, Basil Beirouti wrote: > > Hi, > > Joel thanks for pointing out the indentation issue. I have fixed it. > > Can someone explain what the 3 tests that were automatically run on my > code are? And why did the Appveyor and Travis ones fail? > > Sincerely, > Basil Beirouti > Sent from my iPhone > > > On Jul 11, 2016, at 11:00 AM, scikit-learn-request at python.org wrote: > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: Scikit learn GridSearchCV fit method ValueError Found > > array with 0 sample (Maciek W?jcikowski) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 11 Jul 2016 13:33:28 +0200 > > From: Maciek W?jcikowski > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Scikit learn GridSearchCV fit method > > ValueError Found array with 0 sample > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Shouldn't you pass labels (binary) instead of continuous data? If you > wish > > to stick to logK's and keep the distribution unchanged then you'd better > > reduce the number of classes (eg round the values to nearest integer?). > > > > It might be the case that the counts per class are floored and you get 0 > > for some cases. > > > > ---- > > Pozdrawiam, | Best regards, > > Maciek W?jcikowski > > maciek at wojcikowski.pl > > > > 2016-07-11 13:16 GMT+02:00 Micha? Nowotka : > > > >> Hi Maciek, > >> > >> Thanks for suggestion, I think the problem indeed is related to the > >> StratifiedKFold because if I use KFold instead the code works fine. > >> However, if I print StratifiedKFold object it looks fine to me: > >> > >> sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 > >> 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 > >> 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 > >> 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 > >> 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 > >> 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 > >> 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 > >> 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, > >> random_state=None) > >> > >> > >> On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski > >> wrote: > >>> Hi Micha?, > >>> > >>> What are the class counts in that set? Maybe there is a problem with > >>> generating stratified subsamples (eg some classes get below 1 sample)? > >>> > >>> ---- > >>> Pozdrawiam, | Best regards, > >>> Maciek W?jcikowski > >>> maciek at wojcikowski.pl > >>> > >>> 2016-07-08 17:22 GMT+02:00 Micha? Nowotka : > >>>> > >>>> Hi, > >>>> > >>>> Sorry for cross posting > >>>> > >>>> ( > >> > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > >> ) > >>>> but I don't know where is better to get help with my problem. > >>>> I'm working on a VM with Jupyter notebook server installed. > >>>> From time to time I add new notebooks and reevaluate old ones to see > >>>> if they still work. > >>>> > >>>> This notebook stopped working due to some changes in scikit-learn API > >>>> and some parameters become obsolete: > >> > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > >>>> > >>>> I've created a corrected version of the notebook here: > >>>> > >>>> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > >>>> > >>>> But I'm stuck in cell 36 on this code: > >>>> > >>>> from sklearn.cross_validation import KFold > >>>> from sklearn.grid_search import GridSearchCV > >>>> > >>>> X_traina, X_testa, y_traina, y_testa = > >>>> cross_validation.train_test_split(x, y, test_size=0.95, > >>>> random_state=23) > >>>> > >>>> params = {'min_samples_split': [8], 'max_depth': [20], > >>>> 'min_samples_leaf': [1],'n_estimators':[200]} > >>>> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > >>>> cv_stratified = StratifiedKFold(y_traina, n_folds=5) > >>>> gs = GridSearchCV(custom_forest, params, > >>>> cv=cv_stratified,verbose=1,refit=True) > >>>> gs.fit(X_traina,y_traina) > >>>> > >>>> This gives me: > >>>> > >>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > >>>> minimum of 1 is required. > >>>> > >>>> Now I don't understand this because when I print shapes of the > samples: > >>>> > >>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > >>>> > >>>> I'm getting: > >>>> > >>>> ((78, 491), (1489, 491), (78,), (1489,)) > >>>> > >>>> Interestingly, if I change the test_size parameter to 0.88 (like in > >>>> the example corrected notebook) it works and this is the highest value > >>>> where it works. For this value, the shapes are: > >>>> > >>>> ((188, 491), (1379, 491), (188,), (1379,)) > >>>> > >>>> So the question is - what should I change in my code to make it work > >>>> for test_size set to 0.95 as well? > >>>> > >>>> Kind regards, > >>>> > >>>> Michal Nowotka > >>>> _______________________________________________ > >>>> scikit-learn mailing list > >>>> scikit-learn at python.org > >>>> https://mail.python.org/mailman/listinfo/scikit-learn > >>> > >>> > >>> > >>> _______________________________________________ > >>> scikit-learn mailing list > >>> scikit-learn at python.org > >>> https://mail.python.org/mailman/listinfo/scikit-learn > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160711/d66aa81c/attachment-0001.html > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 15 > > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgollakota at gmail.com Tue Jul 12 01:31:52 2016 From: pgollakota at gmail.com (Praveen Gollakota) Date: Mon, 11 Jul 2016 22:31:52 -0700 Subject: [scikit-learn] Original source for DecisionTreeClassifier Implementation Message-ID: Hello, I was curious if anyone has an original source or paper from which the decision trees were implemented in scikit learn. I see general references for Elements of Statistical Learning and other references but no specific mention of which version of algorithm is actually implemented. I couldn't find any references in https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/tree. Thanks, Praveen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Tue Jul 12 02:17:00 2016 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Tue, 12 Jul 2016 11:47:00 +0530 Subject: [scikit-learn] Original source for DecisionTreeClassifier Implementation In-Reply-To: References: Message-ID: Hi, I am a novice here and am not aware of the exact source for implementation. Probably one of the core devs can answer it. But to my knowledge, it implements an optimised version of CART. The information regarding the algorithms and complexity can be found in http://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart . Hope it helps to some extent. Cheers, Maniteja. _________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.nothman at gmail.com Thu Jul 14 04:17:52 2016 From: joel.nothman at gmail.com (Joel Nothman) Date: Thu, 14 Jul 2016 18:17:52 +1000 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: This has been discussed numerous times. I suppose no one thinks supporting pickle only is great, but a custom dict is unmaintainable. The best we've got AFAIK (and it looks like it's getting better all the time) is a tool to convert one-way to PMML, which is portable to production environments. See https://github.com/jpmml/sklearn2pmml (python interface) and https://github.com/jpmml/jpmml-sklearn(command-line interface and guts of the thing). I hope that helps; and thanks to Villu Ruusmann: that list of supported estimators is awesome! PS: please write to the new list at scikit-learn at python.org On 14 July 2016 at 17:24, Miroslav Zori??k wrote: > Hi everybody, > > I have been using scikit-learn for a while, but I have run into a problem > that does not seem to have any good solutions. > > Basically I would like to: > - build my pipeline in a Jupyter Notebook > - persist it (to json or hdf5) > - load it in production and execute the prediction there > > The problem is that for persisting estimators such as the RobustScaler for > example, the recommended way is to pickle them. Now I don't want to do > this, for three reasons: > > - Security, pickle is potentially dangerous > - Portability, I can't unpickle it in scala for example > - Pickle stores a lot of details and information which is not strictly > necessary to reconstruct the RobustScaler and therefore might prevent it > from being reconstructed correctly if a different version is used. > > Another option I would seem to have is to access the private members of > each serialiser that I want to use and store them on my own, but this is > inconvenient, because: > > - It forces me as a user to understand how the robust scaler works and how > it stores its internal state, which is generally bad for usability > - The internal implementation could change, leaving me to fix my > serialisers (see #1) > - I would need to do this for each new Estimator I decide to use > > Now, to me it seems the solution is quite obvious: > Write a Mixin or update the BaseEstimator class to include two additional > methods: > > to_dict() - will return a dictionary such, that when passed to > from_dict(dictionary) - it will reconstruct the original object > > these dictionaries could be passed to the JSON module or the YAML module > or stored elsewhere. We could provide more convenience methods to do this > for the user. > > In case of the RobustScaler the dict would look something like: > { "center": "0,0", "scale": "1.0"} > > Now the bulk of the work is writing these serialisers and deserialisers > for all of the estimators, but that can be simplified by adding a method > that could do that automatically via reflection and the estimator would > only need to specify which fields to serialise. > > I am happy to start working on this and create a pull request on Github, > but before I do that I wanted to get some initial thoughts and reactions > from the community, so please let me know what you think. > > Best Regards, > Miroslav Zoricak > -- > Best Regards, > Miroslav Zoricak > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dale.T.Smith at macys.com Thu Jul 14 08:35:27 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Thu, 14 Jul 2016 12:35:27 +0000 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: Hello, I investigated this subject last year, and have tried to keep up, so I can perhaps offer some alternatives. ? The only packages I know that read PMML in Python are proprietary. There are several alternatives for writing to PMML, as you can easily find. I also found https://code.google.com/archive/p/augustus/ and https://github.com/ctrl-alt-d/lightpmmlpredictor Depending on your project, sklearn-compiledtrees may be an option. https://github.com/ajtulloch/sklearn-compiledtrees Py2PMML (https://support.zementis.com/entries/37092748-Introducing-Py2PMML) is by Zemantis and it?s a commercial product, meaning you pay for a license. ? Another option is what we planned to do at an old job of mine ? read the model characteristics out of the scikit-learn object after fit, and produce C code ourselves. This is a viable option for decision trees. Adapt print_decision_trees() from this Stackoverflow answer. http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree ? You can also reconsider your use of joblib.dump again. I?m aware that it has problems, but you can include enough versioning information in the objects you dump in order to apply checks in your code to make sure scikit-learn versions are compatible, etc. I know this is a pain in the neck, but it?s a viable alternative to creating your own PMML reader, writing a code generator of some kind, or buying a license. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Joel Nothman Sent: Thursday, July 14, 2016 4:18 AM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] [Scikit-learn-general] Estimator serialisability ? EXT MSG: This has been discussed numerous times. I suppose no one thinks supporting pickle only is great, but a custom dict is unmaintainable. The best we've got AFAIK (and it looks like it's getting better all the time) is a tool to convert one-way to PMML, which is portable to production environments. See https://github.com/jpmml/sklearn2pmml (python interface) and https://github.com/jpmml/jpmml-sklearn(command-line interface and guts of the thing). I hope that helps; and thanks to Villu Ruusmann: that list of supported estimators is awesome! PS: please write to the new list at scikit-learn at python.org On 14 July 2016 at 17:24, Miroslav Zori??k > wrote: Hi everybody, I have been using scikit-learn for a while, but I have run into a problem that does not seem to have any good solutions. Basically I would like to: - build my pipeline in a Jupyter Notebook - persist it (to json or hdf5) - load it in production and execute the prediction there The problem is that for persisting estimators such as the RobustScaler for example, the recommended way is to pickle them. Now I don't want to do this, for three reasons: - Security, pickle is potentially dangerous - Portability, I can't unpickle it in scala for example - Pickle stores a lot of details and information which is not strictly necessary to reconstruct the RobustScaler and therefore might prevent it from being reconstructed correctly if a different version is used. Another option I would seem to have is to access the private members of each serialiser that I want to use and store them on my own, but this is inconvenient, because: - It forces me as a user to understand how the robust scaler works and how it stores its internal state, which is generally bad for usability - The internal implementation could change, leaving me to fix my serialisers (see #1) - I would need to do this for each new Estimator I decide to use Now, to me it seems the solution is quite obvious: Write a Mixin or update the BaseEstimator class to include two additional methods: to_dict() - will return a dictionary such, that when passed to from_dict(dictionary) - it will reconstruct the original object these dictionaries could be passed to the JSON module or the YAML module or stored elsewhere. We could provide more convenience methods to do this for the user. In case of the RobustScaler the dict would look something like: { "center": "0,0", "scale": "1.0"} Now the bulk of the work is writing these serialisers and deserialisers for all of the estimators, but that can be simplified by adding a method that could do that automatically via reflection and the estimator would only need to specify which fields to serialise. I am happy to start working on this and create a pull request on Github, but before I do that I wanted to get some initial thoughts and reactions from the community, so please let me know what you think. Best Regards, Miroslav Zoricak -- Best Regards, Miroslav Zoricak ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkomp at smarterhq.com Thu Jul 14 10:05:32 2016 From: wkomp at smarterhq.com (William Komp) Date: Thu, 14 Jul 2016 10:05:32 -0400 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: Hi, Interesting conversation. I have captured model parameters in sql and use sql for scoring in massively parallel setups. You can score billion record sets in seconds. Works really well with logistic regression and other functional based models. Trees would be a bit more difficult. Has there been any discussion on PFA (Portable Format for Analytics): http://dmg.org/pfa/index.html incorporation in scikit? Bob Grossman is the driving force behind it. Here is a link to a deck from a Predictive Analytics World talk he gave in chicago a few months ago. http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics William On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith wrote: > Hello, > > > > I investigated this subject last year, and have tried to keep up, so I can > perhaps offer some alternatives. > > > > ? The only packages I know that read PMML in Python are > proprietary. There are several alternatives for writing to PMML, as you can > easily find. > > > > I also found > > > > https://code.google.com/archive/p/augustus/ > > > > and > > > > https://github.com/ctrl-alt-d/lightpmmlpredictor > > > > Depending on your project, sklearn-compiledtrees may be an option. > > > > https://github.com/ajtulloch/sklearn-compiledtrees > > > > Py2PMML (https://support.zementis.com/entries/37092748-Introducing-Py2PMML) > is by Zemantis and it?s a commercial product, meaning you pay for a license. > > > > ? Another option is what we planned to do at an old job of mine ? > read the model characteristics out of the scikit-learn object after fit, > and produce C code ourselves. This is a viable option for decision trees. > Adapt print_decision_trees() from this Stackoverflow answer. > > > > > http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree > > > > ? You can also reconsider your use of joblib.dump again. I?m > aware that it has problems, but you can include enough versioning > information in the objects you dump in order to apply checks in your code > to make sure scikit-learn versions are compatible, etc. I know this is a > pain in the neck, but it?s a viable alternative to creating your own PMML > reader, writing a code generator of some kind, or buying a license. > > > > > > > __________________________________________________________________________________________ > *Dale Smith* | Macy's Systems and Technology | IFS eCommerce | Data > Science and Capacity Planning > | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com > > > > *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= > macys.com at python.org] *On Behalf Of *Joel Nothman > *Sent:* Thursday, July 14, 2016 4:18 AM > *To:* Scikit-learn user and developer mailing list > *Subject:* Re: [scikit-learn] [Scikit-learn-general] Estimator > serialisability > > > > ? EXT MSG: > > This has been discussed numerous times. I suppose no one thinks supporting > pickle only is great, but a custom dict is unmaintainable. The best we've > got AFAIK (and it looks > like it's > getting better all the time) is a tool to convert one-way to PMML, which is > portable to production environments. See > https://github.com/jpmml/sklearn2pmml (python interface) and > https://github.com/jpmml/jpmml-sklearn(command-line interface and guts of > the thing). > > > > I hope that helps; and thanks to Villu Ruusmann: that list of supported > estimators is awesome! > > > > PS: please write to the new list at scikit-learn at python.org > > > > On 14 July 2016 at 17:24, Miroslav Zori??k > wrote: > > Hi everybody, > > > > I have been using scikit-learn for a while, but I have run into a problem > that does not seem to have any good solutions. > > > > Basically I would like to: > > - build my pipeline in a Jupyter Notebook > > - persist it (to json or hdf5) > > - load it in production and execute the prediction there > > > > The problem is that for persisting estimators such as the RobustScaler for > example, the recommended way is to pickle them. Now I don't want to do > this, for three reasons: > > > > - Security, pickle is potentially dangerous > > - Portability, I can't unpickle it in scala for example > > - Pickle stores a lot of details and information which is not strictly > necessary to reconstruct the RobustScaler and therefore might prevent it > from being reconstructed correctly if a different version is used. > > > > Another option I would seem to have is to access the private members of > each serialiser that I want to use and store them on my own, but this is > inconvenient, because: > > > > - It forces me as a user to understand how the robust scaler works and how > it stores its internal state, which is generally bad for usability > > - The internal implementation could change, leaving me to fix my > serialisers (see #1) > > - I would need to do this for each new Estimator I decide to use > > > > Now, to me it seems the solution is quite obvious: > > Write a Mixin or update the BaseEstimator class to include two additional > methods: > > > > to_dict() - will return a dictionary such, that when passed to > > from_dict(dictionary) - it will reconstruct the original object > > > > these dictionaries could be passed to the JSON module or the YAML module > or stored elsewhere. We could provide more convenience methods to do this > for the user. > > > > In case of the RobustScaler the dict would look something like: > > { "center": "0,0", "scale": "1.0"} > > > > Now the bulk of the work is writing these serialisers and deserialisers > for all of the estimators, but that can be simplified by adding a method > that could do that automatically via reflection and the estimator would > only need to specify which fields to serialise. > > > > I am happy to start working on this and create a pull request on Github, > but before I do that I wanted to get some initial thoughts and reactions > from the community, so please let me know what you think. > > > > Best Regards, > > Miroslav Zoricak > > -- > > Best Regards, > Miroslav Zoricak > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > * This is an EXTERNAL EMAIL. Stop and think before clicking a link or > opening attachments. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nick.pentreath at gmail.com Thu Jul 14 10:18:58 2016 From: nick.pentreath at gmail.com (Nick Pentreath) Date: Thu, 14 Jul 2016 14:18:58 +0000 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: For PFA, you may wish to check out https://github.com/opendatagroup/hadrian/ (the "titus" subproject is a full Python impl of PFA, with a focus on some "model producing" hooks such as a PrettyPFA higher-level text-based DSL for PFA document construction). On Thu, 14 Jul 2016 at 16:07 William Komp wrote: > Hi, > Interesting conversation. I have captured model parameters in sql and use > sql for scoring in massively parallel setups. You can score billion record > sets in seconds. Works really well with logistic regression and other > functional based models. Trees would be a bit more difficult. > > Has there been any discussion on PFA (Portable Format for Analytics): > http://dmg.org/pfa/index.html incorporation in scikit? Bob Grossman is > the driving force behind it. Here is a link to a deck from a Predictive > Analytics World talk he gave in chicago a few months ago. > > > http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics > > William > > On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith > wrote: > >> Hello, >> >> >> >> I investigated this subject last year, and have tried to keep up, so I >> can perhaps offer some alternatives. >> >> >> >> ? The only packages I know that read PMML in Python are >> proprietary. There are several alternatives for writing to PMML, as you can >> easily find. >> >> >> >> I also found >> >> >> >> https://code.google.com/archive/p/augustus/ >> >> >> >> and >> >> >> >> https://github.com/ctrl-alt-d/lightpmmlpredictor >> >> >> >> Depending on your project, sklearn-compiledtrees may be an option. >> >> >> >> https://github.com/ajtulloch/sklearn-compiledtrees >> >> >> >> Py2PMML ( >> https://support.zementis.com/entries/37092748-Introducing-Py2PMML) is by >> Zemantis and it?s a commercial product, meaning you pay for a license. >> >> >> >> ? Another option is what we planned to do at an old job of mine >> ? read the model characteristics out of the scikit-learn object after fit, >> and produce C code ourselves. This is a viable option for decision trees. >> Adapt print_decision_trees() from this Stackoverflow answer. >> >> >> >> >> http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree >> >> >> >> ? You can also reconsider your use of joblib.dump again. I?m >> aware that it has problems, but you can include enough versioning >> information in the objects you dump in order to apply checks in your code >> to make sure scikit-learn versions are compatible, etc. I know this is a >> pain in the neck, but it?s a viable alternative to creating your own PMML >> reader, writing a code generator of some kind, or buying a license. >> >> >> >> >> >> >> __________________________________________________________________________________________ >> *Dale Smith* | Macy's Systems and Technology | IFS eCommerce | Data >> Science and Capacity Planning >> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >> >> >> >> *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= >> macys.com at python.org] *On Behalf Of *Joel Nothman >> *Sent:* Thursday, July 14, 2016 4:18 AM >> *To:* Scikit-learn user and developer mailing list >> *Subject:* Re: [scikit-learn] [Scikit-learn-general] Estimator >> serialisability >> >> >> >> ? EXT MSG: >> >> This has been discussed numerous times. I suppose no one thinks >> supporting pickle only is great, but a custom dict is unmaintainable. The >> best we've got AFAIK (and it looks >> like it's >> getting better all the time) is a tool to convert one-way to PMML, which is >> portable to production environments. See >> https://github.com/jpmml/sklearn2pmml (python interface) and >> https://github.com/jpmml/jpmml-sklearn(command-line interface and guts >> of the thing). >> >> >> >> I hope that helps; and thanks to Villu Ruusmann: that list of supported >> estimators is awesome! >> >> >> >> PS: please write to the new list at scikit-learn at python.org >> >> >> >> On 14 July 2016 at 17:24, Miroslav Zori??k >> wrote: >> >> Hi everybody, >> >> >> >> I have been using scikit-learn for a while, but I have run into a problem >> that does not seem to have any good solutions. >> >> >> >> Basically I would like to: >> >> - build my pipeline in a Jupyter Notebook >> >> - persist it (to json or hdf5) >> >> - load it in production and execute the prediction there >> >> >> >> The problem is that for persisting estimators such as the RobustScaler >> for example, the recommended way is to pickle them. Now I don't want to do >> this, for three reasons: >> >> >> >> - Security, pickle is potentially dangerous >> >> - Portability, I can't unpickle it in scala for example >> >> - Pickle stores a lot of details and information which is not strictly >> necessary to reconstruct the RobustScaler and therefore might prevent it >> from being reconstructed correctly if a different version is used. >> >> >> >> Another option I would seem to have is to access the private members of >> each serialiser that I want to use and store them on my own, but this is >> inconvenient, because: >> >> >> >> - It forces me as a user to understand how the robust scaler works and >> how it stores its internal state, which is generally bad for usability >> >> - The internal implementation could change, leaving me to fix my >> serialisers (see #1) >> >> - I would need to do this for each new Estimator I decide to use >> >> >> >> Now, to me it seems the solution is quite obvious: >> >> Write a Mixin or update the BaseEstimator class to include two additional >> methods: >> >> >> >> to_dict() - will return a dictionary such, that when passed to >> >> from_dict(dictionary) - it will reconstruct the original object >> >> >> >> these dictionaries could be passed to the JSON module or the YAML module >> or stored elsewhere. We could provide more convenience methods to do this >> for the user. >> >> >> >> In case of the RobustScaler the dict would look something like: >> >> { "center": "0,0", "scale": "1.0"} >> >> >> >> Now the bulk of the work is writing these serialisers and deserialisers >> for all of the estimators, but that can be simplified by adding a method >> that could do that automatically via reflection and the estimator would >> only need to specify which fields to serialise. >> >> >> >> I am happy to start working on this and create a pull request on Github, >> but before I do that I wanted to get some initial thoughts and reactions >> from the community, so please let me know what you think. >> >> >> >> Best Regards, >> >> Miroslav Zoricak >> >> -- >> >> Best Regards, >> Miroslav Zoricak >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and >> traffic >> patterns at an interface-level. Reveals which users, apps, and protocols >> are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> reports.http://sdm.link/zohodev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or >> opening attachments. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkomp at smarterhq.com Thu Jul 14 10:20:15 2016 From: wkomp at smarterhq.com (William Komp) Date: Thu, 14 Jul 2016 10:20:15 -0400 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: Thanks Nick! On Thu, Jul 14, 2016 at 10:18 AM, Nick Pentreath wrote: > For PFA, you may wish to check out > https://github.com/opendatagroup/hadrian/ (the "titus" subproject is a > full Python impl of PFA, with a focus on some "model producing" hooks such > as a PrettyPFA higher-level text-based DSL for PFA document construction). > > > > On Thu, 14 Jul 2016 at 16:07 William Komp wrote: > >> Hi, >> Interesting conversation. I have captured model parameters in sql and use >> sql for scoring in massively parallel setups. You can score billion record >> sets in seconds. Works really well with logistic regression and other >> functional based models. Trees would be a bit more difficult. >> >> Has there been any discussion on PFA (Portable Format for Analytics): >> http://dmg.org/pfa/index.html incorporation in scikit? Bob Grossman is >> the driving force behind it. Here is a link to a deck from a Predictive >> Analytics World talk he gave in chicago a few months ago. >> >> >> http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics >> >> William >> >> On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith >> wrote: >> >>> Hello, >>> >>> >>> >>> I investigated this subject last year, and have tried to keep up, so I >>> can perhaps offer some alternatives. >>> >>> >>> >>> ? The only packages I know that read PMML in Python are >>> proprietary. There are several alternatives for writing to PMML, as you can >>> easily find. >>> >>> >>> >>> I also found >>> >>> >>> >>> https://code.google.com/archive/p/augustus/ >>> >>> >>> >>> and >>> >>> >>> >>> https://github.com/ctrl-alt-d/lightpmmlpredictor >>> >>> >>> >>> Depending on your project, sklearn-compiledtrees may be an option. >>> >>> >>> >>> https://github.com/ajtulloch/sklearn-compiledtrees >>> >>> >>> >>> Py2PMML ( >>> https://support.zementis.com/entries/37092748-Introducing-Py2PMML) is >>> by Zemantis and it?s a commercial product, meaning you pay for a license. >>> >>> >>> >>> ? Another option is what we planned to do at an old job of mine >>> ? read the model characteristics out of the scikit-learn object after fit, >>> and produce C code ourselves. This is a viable option for decision trees. >>> Adapt print_decision_trees() from this Stackoverflow answer. >>> >>> >>> >>> >>> http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree >>> >>> >>> >>> ? You can also reconsider your use of joblib.dump again. I?m >>> aware that it has problems, but you can include enough versioning >>> information in the objects you dump in order to apply checks in your code >>> to make sure scikit-learn versions are compatible, etc. I know this is a >>> pain in the neck, but it?s a viable alternative to creating your own PMML >>> reader, writing a code generator of some kind, or buying a license. >>> >>> >>> >>> >>> >>> >>> __________________________________________________________________________________________ >>> *Dale Smith* | Macy's Systems and Technology | IFS eCommerce | Data >>> Science and Capacity Planning >>> | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com >>> >>> >>> >>> *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= >>> macys.com at python.org] *On Behalf Of *Joel Nothman >>> *Sent:* Thursday, July 14, 2016 4:18 AM >>> *To:* Scikit-learn user and developer mailing list >>> *Subject:* Re: [scikit-learn] [Scikit-learn-general] Estimator >>> serialisability >>> >>> >>> >>> ? EXT MSG: >>> >>> This has been discussed numerous times. I suppose no one thinks >>> supporting pickle only is great, but a custom dict is unmaintainable. The >>> best we've got AFAIK (and it looks >>> like it's >>> getting better all the time) is a tool to convert one-way to PMML, which is >>> portable to production environments. See >>> https://github.com/jpmml/sklearn2pmml (python interface) and >>> https://github.com/jpmml/jpmml-sklearn(command-line interface and guts >>> of the thing). >>> >>> >>> >>> I hope that helps; and thanks to Villu Ruusmann: that list of supported >>> estimators is awesome! >>> >>> >>> >>> PS: please write to the new list at scikit-learn at python.org >>> >>> >>> >>> On 14 July 2016 at 17:24, Miroslav Zori??k >>> wrote: >>> >>> Hi everybody, >>> >>> >>> >>> I have been using scikit-learn for a while, but I have run into a >>> problem that does not seem to have any good solutions. >>> >>> >>> >>> Basically I would like to: >>> >>> - build my pipeline in a Jupyter Notebook >>> >>> - persist it (to json or hdf5) >>> >>> - load it in production and execute the prediction there >>> >>> >>> >>> The problem is that for persisting estimators such as the RobustScaler >>> for example, the recommended way is to pickle them. Now I don't want to do >>> this, for three reasons: >>> >>> >>> >>> - Security, pickle is potentially dangerous >>> >>> - Portability, I can't unpickle it in scala for example >>> >>> - Pickle stores a lot of details and information which is not strictly >>> necessary to reconstruct the RobustScaler and therefore might prevent it >>> from being reconstructed correctly if a different version is used. >>> >>> >>> >>> Another option I would seem to have is to access the private members of >>> each serialiser that I want to use and store them on my own, but this is >>> inconvenient, because: >>> >>> >>> >>> - It forces me as a user to understand how the robust scaler works and >>> how it stores its internal state, which is generally bad for usability >>> >>> - The internal implementation could change, leaving me to fix my >>> serialisers (see #1) >>> >>> - I would need to do this for each new Estimator I decide to use >>> >>> >>> >>> Now, to me it seems the solution is quite obvious: >>> >>> Write a Mixin or update the BaseEstimator class to include two >>> additional methods: >>> >>> >>> >>> to_dict() - will return a dictionary such, that when passed to >>> >>> from_dict(dictionary) - it will reconstruct the original object >>> >>> >>> >>> these dictionaries could be passed to the JSON module or the YAML module >>> or stored elsewhere. We could provide more convenience methods to do this >>> for the user. >>> >>> >>> >>> In case of the RobustScaler the dict would look something like: >>> >>> { "center": "0,0", "scale": "1.0"} >>> >>> >>> >>> Now the bulk of the work is writing these serialisers and deserialisers >>> for all of the estimators, but that can be simplified by adding a method >>> that could do that automatically via reflection and the estimator would >>> only need to specify which fields to serialise. >>> >>> >>> >>> I am happy to start working on this and create a pull request on Github, >>> but before I do that I wanted to get some initial thoughts and reactions >>> from the community, so please let me know what you think. >>> >>> >>> >>> Best Regards, >>> >>> Miroslav Zoricak >>> >>> -- >>> >>> Best Regards, >>> Miroslav Zoricak >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >>> traffic >>> patterns at an interface-level. Reveals which users, apps, and protocols >>> are >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >>> J-Flow, sFlow and other flows. Make informed decisions using capacity >>> planning >>> reports.http://sdm.link/zohodev2dev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >>> >>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or >>> opening attachments. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dale.T.Smith at macys.com Thu Jul 14 10:11:50 2016 From: Dale.T.Smith at macys.com (Dale T Smith) Date: Thu, 14 Jul 2016 14:11:50 +0000 Subject: [scikit-learn] [Scikit-learn-general] Estimator serialisability In-Reply-To: References: Message-ID: Spark has a project PMML for Pipelines. https://issues.apache.org/jira/browse/SPARK-11171 __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of William Komp Sent: Thursday, July 14, 2016 10:06 AM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] [Scikit-learn-general] Estimator serialisability ? EXT MSG: Hi, Interesting conversation. I have captured model parameters in sql and use sql for scoring in massively parallel setups. You can score billion record sets in seconds. Works really well with logistic regression and other functional based models. Trees would be a bit more difficult. Has there been any discussion on PFA (Portable Format for Analytics): http://dmg.org/pfa/index.html incorporation in scikit? Bob Grossman is the driving force behind it. Here is a link to a deck from a Predictive Analytics World talk he gave in chicago a few months ago. http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics William On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith > wrote: Hello, I investigated this subject last year, and have tried to keep up, so I can perhaps offer some alternatives. ? The only packages I know that read PMML in Python are proprietary. There are several alternatives for writing to PMML, as you can easily find. I also found https://code.google.com/archive/p/augustus/ and https://github.com/ctrl-alt-d/lightpmmlpredictor Depending on your project, sklearn-compiledtrees may be an option. https://github.com/ajtulloch/sklearn-compiledtrees Py2PMML (https://support.zementis.com/entries/37092748-Introducing-Py2PMML) is by Zemantis and it?s a commercial product, meaning you pay for a license. ? Another option is what we planned to do at an old job of mine ? read the model characteristics out of the scikit-learn object after fit, and produce C code ourselves. This is a viable option for decision trees. Adapt print_decision_trees() from this Stackoverflow answer. http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree ? You can also reconsider your use of joblib.dump again. I?m aware that it has problems, but you can include enough versioning information in the objects you dump in order to apply checks in your code to make sure scikit-learn versions are compatible, etc. I know this is a pain in the neck, but it?s a viable alternative to creating your own PMML reader, writing a code generator of some kind, or buying a license. __________________________________________________________________________________________ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.smith at macys.com From: scikit-learn [mailto:scikit-learn-bounces+dale.t.smith=macys.com at python.org] On Behalf Of Joel Nothman Sent: Thursday, July 14, 2016 4:18 AM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] [Scikit-learn-general] Estimator serialisability ? EXT MSG: This has been discussed numerous times. I suppose no one thinks supporting pickle only is great, but a custom dict is unmaintainable. The best we've got AFAIK (and it looks like it's getting better all the time) is a tool to convert one-way to PMML, which is portable to production environments. See https://github.com/jpmml/sklearn2pmml (python interface) and https://github.com/jpmml/jpmml-sklearn(command-line interface and guts of the thing). I hope that helps; and thanks to Villu Ruusmann: that list of supported estimators is awesome! PS: please write to the new list at scikit-learn at python.org On 14 July 2016 at 17:24, Miroslav Zori??k > wrote: Hi everybody, I have been using scikit-learn for a while, but I have run into a problem that does not seem to have any good solutions. Basically I would like to: - build my pipeline in a Jupyter Notebook - persist it (to json or hdf5) - load it in production and execute the prediction there The problem is that for persisting estimators such as the RobustScaler for example, the recommended way is to pickle them. Now I don't want to do this, for three reasons: - Security, pickle is potentially dangerous - Portability, I can't unpickle it in scala for example - Pickle stores a lot of details and information which is not strictly necessary to reconstruct the RobustScaler and therefore might prevent it from being reconstructed correctly if a different version is used. Another option I would seem to have is to access the private members of each serialiser that I want to use and store them on my own, but this is inconvenient, because: - It forces me as a user to understand how the robust scaler works and how it stores its internal state, which is generally bad for usability - The internal implementation could change, leaving me to fix my serialisers (see #1) - I would need to do this for each new Estimator I decide to use Now, to me it seems the solution is quite obvious: Write a Mixin or update the BaseEstimator class to include two additional methods: to_dict() - will return a dictionary such, that when passed to from_dict(dictionary) - it will reconstruct the original object these dictionaries could be passed to the JSON module or the YAML module or stored elsewhere. We could provide more convenience methods to do this for the user. In case of the RobustScaler the dict would look something like: { "center": "0,0", "scale": "1.0"} Now the bulk of the work is writing these serialisers and deserialisers for all of the estimators, but that can be simplified by adding a method that could do that automatically via reflection and the estimator would only need to specify which fields to serialise. I am happy to start working on this and create a pull request on Github, but before I do that I wanted to get some initial thoughts and reactions from the community, so please let me know what you think. Best Regards, Miroslav Zoricak -- Best Regards, Miroslav Zoricak ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general * This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening attachments. _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu at mblondel.org Fri Jul 15 21:38:50 2016 From: mathieu at mblondel.org (Mathieu Blondel) Date: Sat, 16 Jul 2016 10:38:50 +0900 Subject: [scikit-learn] Question regarding kernel PCA implementation in scikit-learn In-Reply-To: References: Message-ID: Forwarding your question to the mailing-list. On Thu, Jul 14, 2016 at 10:33 PM, Christos Lataniotis < lataniotis at ibk.baug.ethz.ch> wrote: > Dear Mathieu Blondel, > > I am a PhD student working on some machine-learning aspects related to > dimensionality reduction. One of the methods that is of interest to me is > kernel PCA so I tested the implementation that is offered by scikit-learn > which I think is the most complete from the ones I could find on the web. > > I would like to ask for some clarification regarding the way you > implemented the inverse transform, i.e. solving the pre-image problem. > > Although the paper from Bakir et. al, 2004 is cited, I think there is some > difference in your implementation and the methodology that is discussed on > that paper. Bakir suggests ?learning' the pre-image map by solving a kernel > ridge regression problem with some kernel function, say l, that is > different than the kernel function, say k, that is used in kernel PCA, > However by going through the source code of your implementation I think > that kernel functions l and k coincide. It that correct? If yes, is there > some justification (e.g. empirical) for making such assumption? I am asking > this because as far as I have read in the literature selecting the kernel > function l is kind of an open question still so I would expect it to be a > parameter that can be selected by the user on top of selecting the kernel > function for kernel PCA. > > Thank you for your time in advance. > > Best Regards, > Christos > > > -- > Christos Lataniotis > Institute of Structural Engineering > Chair of Risk, Safety and Uncertainty Quantification ETH Z?rich - HIL E > 35.1 > Wolfgang-Pauli-Str. 15 > CH-8093 Z?rich, Switzerland > Tel: +41 44 633 06 70 > E-Mail: lataniotis at ibk.baug.ethz.ch > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Sat Jul 16 10:31:55 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Sat, 16 Jul 2016 09:31:55 -0500 Subject: [scikit-learn] SciPy Sprints Message-ID: <578A455B.1080707@gmail.com> Hey all. Just a ping that the scipy sprints are starting now. Review helpers welcome ;) We have about 20 people that said they might be interested in contributing! We'll take over the gitter channel https://gitter.im/scikit-learn/scikit-learn I tagged some issues as "sprint", if you know anything that might be suitable, please add the tag. Best, Andy From t3kcit at gmail.com Sun Jul 17 12:07:02 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Sun, 17 Jul 2016 11:07:02 -0500 Subject: [scikit-learn] Original source for DecisionTreeClassifier Implementation In-Reply-To: References: Message-ID: <578BAD26.9060402@gmail.com> On 07/12/2016 12:31 AM, Praveen Gollakota wrote: > Hello, > > I was curious if anyone has an original source or paper from which the > decision trees were implemented in scikit learn. I see general > references for Elements of Statistical Learning and other references > but no specific mention of which version of algorithm is actually > implemented. > > I couldn't find any references in > https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/tree. They are in the docstring: http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L652 From mathieu at mblondel.org Tue Jul 19 12:03:55 2016 From: mathieu at mblondel.org (Mathieu Blondel) Date: Wed, 20 Jul 2016 01:03:55 +0900 Subject: [scikit-learn] Three new scikit-learn-contrib projects Message-ID: Hi everyone, We are pleased to announce that three new projects recently joined scikit-learn-contrib! * imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn Python module to perform under sampling and over sampling with various techniques. * polylearn: https://github.com/scikit-learn-contrib/polylearn Factorization machines and polynomial networks for classification and regression in Python. * forest-confidence-interval: https://github.com/scikit-learn-contrib/forest-confidence-interval Confidence intervals for scikit-learn forest algorithms. We thank the respective authors for their neat contribution to the scikit-learn ecosystem! Cheers, Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Tue Jul 19 12:14:15 2016 From: nfliu at uw.edu (Nelson Liu) Date: Tue, 19 Jul 2016 16:14:15 +0000 Subject: [scikit-learn] Three new scikit-learn-contrib projects In-Reply-To: References: Message-ID: Congrats! These look great, thanks to both the authors and the scikit-learn-contrib organizers for putting this together. Nelson On Tue, Jul 19, 2016 at 9:09 AM Mathieu Blondel wrote: > Hi everyone, > > We are pleased to announce that three new projects recently joined > scikit-learn-contrib! > > * imbalanced-learn: > https://github.com/scikit-learn-contrib/imbalanced-learn > > Python module to perform under sampling and over sampling with various > techniques. > > * polylearn: https://github.com/scikit-learn-contrib/polylearn > > Factorization machines and polynomial networks for classification and > regression in Python. > > * forest-confidence-interval: > https://github.com/scikit-learn-contrib/forest-confidence-interval > > Confidence intervals for scikit-learn forest algorithms. > > We thank the respective authors for their neat contribution to the > scikit-learn ecosystem! > > Cheers, > Mathieu > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blrstartuphire at gmail.com Tue Jul 19 23:57:15 2016 From: blrstartuphire at gmail.com (Startup Hire) Date: Wed, 20 Jul 2016 09:27:15 +0530 Subject: [scikit-learn] Three new scikit-learn-contrib projects In-Reply-To: References: Message-ID: Awesome! Thanks to the contributors On Tue, Jul 19, 2016 at 9:44 PM, Nelson Liu wrote: > Congrats! These look great, thanks to both the authors and the > scikit-learn-contrib organizers for putting this together. > > Nelson > > On Tue, Jul 19, 2016 at 9:09 AM Mathieu Blondel > wrote: > >> Hi everyone, >> >> We are pleased to announce that three new projects recently joined >> scikit-learn-contrib! >> >> * imbalanced-learn: >> https://github.com/scikit-learn-contrib/imbalanced-learn >> >> Python module to perform under sampling and over sampling with various >> techniques. >> >> * polylearn: https://github.com/scikit-learn-contrib/polylearn >> >> Factorization machines and polynomial networks for classification and >> regression in Python. >> >> * forest-confidence-interval: >> https://github.com/scikit-learn-contrib/forest-confidence-interval >> >> Confidence intervals for scikit-learn forest algorithms. >> >> We thank the respective authors for their neat contribution to the >> scikit-learn ecosystem! >> >> Cheers, >> Mathieu >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yenchenlin1994 at gmail.com Wed Jul 20 01:05:48 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Wed, 20 Jul 2016 13:05:48 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? Message-ID: Hi all, currently the CI tests of my PR is failing only on appveyor when PYTHON_ARCH=32. Are there any ways to build a PYTHON_ARCH=32 version of scikit-learn on a mac, or the only solution is to get a windows computer? Best, Yen-Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Wed Jul 20 01:23:36 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Wed, 20 Jul 2016 01:23:36 -0400 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Hi, Yen-Chen, the problem could be just due to 32bit (and floating point impr?) rather than the Windows environment in general? You could try running the tests on 32 bit Python and see if they come up (before you take the more tedious path and set up a virtual environment, e.g., VirtualBox running Windows XP). E.g., via conda you could do # Create set CONDA_FORCE_32BIT=1 conda create -n 32bit_py27 python=2 # Activate set CONDA_FORCE_32BIT=1 activate 32bit_py27 Best, Sebastian > On Jul 20, 2016, at 1:05 AM, lin yenchen wrote: > > Hi all, > > currently the CI tests of my PR is failing only on appveyor when PYTHON_ARCH=32. > > Are there any ways to build a PYTHON_ARCH=32 version of scikit-learn on a mac, > or the only solution is to get a windows computer? > > Best, > Yen-Chen > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From matthew.brett at gmail.com Wed Jul 20 03:41:16 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 08:41:16 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 6:05 AM, lin yenchen wrote: > Hi all, > > currently the CI tests of my PR is failing only on appveyor when > PYTHON_ARCH=32. > > Are there any ways to build a PYTHON_ARCH=32 version of scikit-learn on a > mac, > or the only solution is to get a windows computer? If you install Python from Python.org installers, then the default Python build flags, which should apply to scikit-learn too, specify to build for both 32- and 64-bit (dual arch build). I just tried: pip install -e . in the scikit-learn directory, for a Python.org Python. That built with the expected flags. Then I checked skilearn imported OK in 32-bit mode with: arch -i386 python >>> import sklearn Then I ran the sklearn tests with: arch -i386 nosetests sklearn I got a few failures. I believe this `arch -i386` only works as a prefix for Python.org Python, but I'm happy to be corrected. Cheers, Matthew From gael.varoquaux at normalesup.org Wed Jul 20 03:48:43 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 20 Jul 2016 09:48:43 +0200 Subject: [scikit-learn] Three new scikit-learn-contrib projects In-Reply-To: References: Message-ID: <20160720074843.GA609220@phare.normalesup.org> Hey, These packages look great! I was interested in the imbalanced learning, which is something that we stumbled upon: > * imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn > Python module to perform under sampling and over sampling with various > techniques. Interestingly, the fit_sample method is related to the scikit-learn enhancement proposal that we have tried to put together objects that can modify the y in addition to the X: https://github.com/scikit-learn/enhancement_proposals/pull/2 I think that this enhancement proposal of our API is important for two reasons. The first one is that the corresponding objects cannot be put in a pipeline (imbalanced-learn ends up having it's own pipeline), and hence cannot benefit from hyper-parameter tuning on the full set of steps, or cool things like DaskLearn. The second one is that different projects are likely to come up with similar but incompatible solutions to this problem, making it harder to combine things. Unfortunately, I haven't had time to push forward this proposal. But comments on it (or a pull request to it) would be awesome. Cheers, Ga?l From olivier.grisel at ensta.org Wed Jul 20 04:09:19 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 20 Jul 2016 10:09:19 +0200 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: > I believe this `arch -i386` only works as a prefix for Python.org Python, but I'm happy to be corrected. Then the following should work: arch -i386 python -c "import nose; nose.main()" sklearn From matthew.brett at gmail.com Wed Jul 20 04:16:52 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 09:16:52 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 9:09 AM, Olivier Grisel wrote: >> I believe this `arch -i386` only works as a prefix for Python.org Python, but I'm happy to be corrected. > > Then the following should work: > > arch -i386 python -c "import nose; nose.main()" sklearn Sorry - I should have been clear - this does work in selecting 32-bit for the tests, using a nosetests installed into a Python.org Python environment: arch -i386 nosetests sklearn Cheers, Matthew From matthew.brett at gmail.com Wed Jul 20 12:00:49 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 17:00:49 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 9:16 AM, Matthew Brett wrote: > On Wed, Jul 20, 2016 at 9:09 AM, Olivier Grisel > wrote: >>> I believe this `arch -i386` only works as a prefix for Python.org Python, but I'm happy to be corrected. >> >> Then the following should work: >> >> arch -i386 python -c "import nose; nose.main()" sklearn > > Sorry - I should have been clear - this does work in selecting 32-bit > for the tests, using a nosetests installed into a Python.org Python > environment: > > arch -i386 nosetests sklearn Actually, I took the liberty of adding the OSX 32-bit tests to the wheel build tests, and scikit-learn 0.17.1 has one failure on 32-bit OSX: https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/146127267#L11761 ``` ====================================================================== ERROR: sklearn.tree.tests.test_tree.test_huge_allocations ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tests/test_tree.py", line 1032, in test_huge_allocations assert_raises(MemoryError, clf.fit, X, y) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", line 727, in assertRaises return context.handle('assertRaises', args, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", line 176, in handle callable_obj(*args, **kwargs) File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tree.py", line 348, in fit max_leaf_nodes) File "sklearn/tree/_tree.pyx", line 291, in sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ (/private/var/folders/gw/_2jq29095y7b__wtby9dg_5h0000gn/T/pip-j_xtpbj9-build/sklearn/tree/_tree.c:4461) OverflowError: Python int too large to convert to C long ``` No failures for 32-bit Linux though. Best, Matthew From yenchenlin1994 at gmail.com Wed Jul 20 12:25:53 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 00:25:53 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Thanks for you guys' precious inputs. I've successfully built a 32-bit python version scikit-learn and check it by printing `sys.maxint`, and all the tests passed on my mac. (I'm running the newest dev version though) Best, Yen-Chen 2016-07-21 0:00 GMT+08:00 Matthew Brett : > On Wed, Jul 20, 2016 at 9:16 AM, Matthew Brett > wrote: > > On Wed, Jul 20, 2016 at 9:09 AM, Olivier Grisel > > wrote: > >>> I believe this `arch -i386` only works as a prefix for Python.org > Python, but I'm happy to be corrected. > >> > >> Then the following should work: > >> > >> arch -i386 python -c "import nose; nose.main()" sklearn > > > > Sorry - I should have been clear - this does work in selecting 32-bit > > for the tests, using a nosetests installed into a Python.org Python > > environment: > > > > arch -i386 nosetests sklearn > > Actually, I took the liberty of adding the OSX 32-bit tests to the > wheel build tests, and scikit-learn 0.17.1 has one failure on 32-bit > OSX: > > https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/146127267#L11761 > > ``` > > ====================================================================== > ERROR: sklearn.tree.tests.test_tree.test_huge_allocations > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/nose/case.py", > line 198, in runTest > self.test(*self.arg) > File > "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tests/test_tree.py", > line 1032, in test_huge_allocations > assert_raises(MemoryError, clf.fit, X, y) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", > line 727, in assertRaises > return context.handle('assertRaises', args, kwargs) > File > "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", > line 176, in handle > callable_obj(*args, **kwargs) > File > "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tree.py", > line 348, in fit > max_leaf_nodes) > File "sklearn/tree/_tree.pyx", line 291, in > sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ > > (/private/var/folders/gw/_2jq29095y7b__wtby9dg_5h0000gn/T/pip-j_xtpbj9-build/sklearn/tree/_tree.c:4461) > OverflowError: Python int too large to convert to C long > ``` > > No failures for 32-bit Linux though. > > Best, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yenchenlin1994 at gmail.com Wed Jul 20 12:34:47 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 00:34:47 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Well, but the tests failed on CI of my PR passed on my local 32 bit scikit-learn. Do this mean that the problem exists on the Windows? (I assumed appveyor is running Window because of C:\ in the console) Sorry if this is a nonsense assumption. Best, Yen-Chen 2016-07-21 0:25 GMT+08:00 lin yenchen : > Thanks for you guys' precious inputs. > > I've successfully built a 32-bit python version scikit-learn and check it > by printing `sys.maxint`, > and all the tests passed on my mac. (I'm running the newest dev version > though) > > Best, > Yen-Chen > > > 2016-07-21 0:00 GMT+08:00 Matthew Brett : > >> On Wed, Jul 20, 2016 at 9:16 AM, Matthew Brett >> wrote: >> > On Wed, Jul 20, 2016 at 9:09 AM, Olivier Grisel >> > wrote: >> >>> I believe this `arch -i386` only works as a prefix for Python.org >> Python, but I'm happy to be corrected. >> >> >> >> Then the following should work: >> >> >> >> arch -i386 python -c "import nose; nose.main()" sklearn >> > >> > Sorry - I should have been clear - this does work in selecting 32-bit >> > for the tests, using a nosetests installed into a Python.org Python >> > environment: >> > >> > arch -i386 nosetests sklearn >> >> Actually, I took the liberty of adding the OSX 32-bit tests to the >> wheel build tests, and scikit-learn 0.17.1 has one failure on 32-bit >> OSX: >> >> https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/146127267#L11761 >> >> ``` >> >> ====================================================================== >> ERROR: sklearn.tree.tests.test_tree.test_huge_allocations >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/nose/case.py", >> line 198, in runTest >> self.test(*self.arg) >> File >> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tests/test_tree.py", >> line 1032, in test_huge_allocations >> assert_raises(MemoryError, clf.fit, X, y) >> File >> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", >> line 727, in assertRaises >> return context.handle('assertRaises', args, kwargs) >> File >> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", >> line 176, in handle >> callable_obj(*args, **kwargs) >> File >> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tree.py", >> line 348, in fit >> max_leaf_nodes) >> File "sklearn/tree/_tree.pyx", line 291, in >> sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ >> >> (/private/var/folders/gw/_2jq29095y7b__wtby9dg_5h0000gn/T/pip-j_xtpbj9-build/sklearn/tree/_tree.c:4461) >> OverflowError: Python int too large to convert to C long >> ``` >> >> No failures for 32-bit Linux though. >> >> Best, >> >> Matthew >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Wed Jul 20 12:46:45 2016 From: nfliu at uw.edu (Nelson Liu) Date: Wed, 20 Jul 2016 16:46:45 +0000 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Your assumption is correct, appveyor does run windows. Nelson On Wed, Jul 20, 2016 at 9:36 AM lin yenchen wrote: > Well, but the tests failed on CI > of > my PR passed on > my local 32 bit scikit-learn. > Do this mean that the problem exists on the Windows? (I > assumed appveyor is running Window because of C:\ in the console) > > Sorry if this is a nonsense assumption. > > Best, > Yen-Chen > > 2016-07-21 0:25 GMT+08:00 lin yenchen : > >> Thanks for you guys' precious inputs. >> >> I've successfully built a 32-bit python version scikit-learn and check it >> by printing `sys.maxint`, >> and all the tests passed on my mac. (I'm running the newest dev version >> though) >> >> Best, >> Yen-Chen >> >> >> 2016-07-21 0:00 GMT+08:00 Matthew Brett : >> >>> On Wed, Jul 20, 2016 at 9:16 AM, Matthew Brett >>> wrote: >>> > On Wed, Jul 20, 2016 at 9:09 AM, Olivier Grisel >>> > wrote: >>> >>> I believe this `arch -i386` only works as a prefix for Python.org >>> Python, but I'm happy to be corrected. >>> >> >>> >> Then the following should work: >>> >> >>> >> arch -i386 python -c "import nose; nose.main()" sklearn >>> > >>> > Sorry - I should have been clear - this does work in selecting 32-bit >>> > for the tests, using a nosetests installed into a Python.org Python >>> > environment: >>> > >>> > arch -i386 nosetests sklearn >>> >>> Actually, I took the liberty of adding the OSX 32-bit tests to the >>> wheel build tests, and scikit-learn 0.17.1 has one failure on 32-bit >>> OSX: >>> >>> https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/146127267#L11761 >>> >>> ``` >>> >>> ====================================================================== >>> ERROR: sklearn.tree.tests.test_tree.test_huge_allocations >>> ---------------------------------------------------------------------- >>> Traceback (most recent call last): >>> File >>> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/nose/case.py", >>> line 198, in runTest >>> self.test(*self.arg) >>> File >>> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tests/test_tree.py", >>> line 1032, in test_huge_allocations >>> assert_raises(MemoryError, clf.fit, X, y) >>> File >>> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", >>> line 727, in assertRaises >>> return context.handle('assertRaises', args, kwargs) >>> File >>> "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", >>> line 176, in handle >>> callable_obj(*args, **kwargs) >>> File >>> "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.5/site-packages/sklearn/tree/tree.py", >>> line 348, in fit >>> max_leaf_nodes) >>> File "sklearn/tree/_tree.pyx", line 291, in >>> sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ >>> >>> (/private/var/folders/gw/_2jq29095y7b__wtby9dg_5h0000gn/T/pip-j_xtpbj9-build/sklearn/tree/_tree.c:4461) >>> OverflowError: Python int too large to convert to C long >>> ``` >>> >>> No failures for 32-bit Linux though. >>> >>> Best, >>> >>> Matthew >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 20 12:55:54 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 17:55:54 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 5:25 PM, lin yenchen wrote: > Thanks for you guys' precious inputs. > > I've successfully built a 32-bit python version scikit-learn and check it by > printing `sys.maxint`, > and all the tests passed on my mac. (I'm running the newest dev version > though) On current master I get the following failures from: arch -i386 nosetests sklearn ``` ====================================================================== ERROR: sklearn.decomposition.tests.test_nmf.test_non_negative_factorization_checking ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", line 342, in wrapper return fn(*args, **kwargs) File "/Users/mb312/dev_trees/scikit-learn/sklearn/decomposition/tests/test_nmf.py", line 237, in test_non_negative_factorization_checking assert_no_warnings(nnmf, A, A, A, np.int64(1)) File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", line 272, in assert_no_warnings result = func(*args, **kw) File "/Users/mb312/dev_trees/scikit-learn/sklearn/decomposition/nmf.py", line 751, in non_negative_factorization " got (n_components=%r)" % n_components) ValueError: Number of components must be a positive integer; got (n_components=1) ====================================================================== ERROR: Test that it gives proper exception on deficient input. ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/ensemble/tests/test_iforest.py", line 107, in test_iforest_error assert_no_warnings(IsolationForest(max_samples=np.int64(2)).fit, X) File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", line 272, in assert_no_warnings result = func(*args, **kw) File "/Users/mb312/dev_trees/scikit-learn/sklearn/ensemble/iforest.py", line 182, in fit raise ValueError("max_samples must be in (0, 1]") ValueError: max_samples must be in (0, 1] ====================================================================== ERROR: sklearn.linear_model.tests.test_huber.test_huber_better_r2_score ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", line 170, in test_huber_better_r2_score huber_score = huber.score(X[mask], y[mask]) File "/Users/mb312/dev_trees/scikit-learn/sklearn/base.py", line 363, in score return r2_score(y, self.predict(X), sample_weight=sample_weight, File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/base.py", line 268, in predict return self._decision_function(X) File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/base.py", line 251, in _decision_function X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/validation.py", line 415, in check_array context)) ValueError: Found array with 0 sample(s) (shape=(0, 20)) while a minimum of 1 is required. ====================================================================== ERROR: sklearn.tree.tests.test_tree.test_huge_allocations ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/tree/tests/test_tree.py", line 1089, in test_huge_allocations assert_raises(MemoryError, clf.fit, X, y) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 473, in assertRaises callableObj(*args, **kwargs) File "/Users/mb312/dev_trees/scikit-learn/sklearn/tree/tree.py", line 366, in fit max_leaf_nodes) File "sklearn/tree/_tree.pyx", line 292, in sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ (sklearn/tree/_tree.c:4728) SIZE_t max_depth, SIZE_t max_leaf_nodes): OverflowError: Python int too large to convert to C long ====================================================================== FAIL: Test that outliers filtering is scaling independent. ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", line 120, in test_huber_scaling_invariant assert_array_equal(n_outliers_mask_3, n_outliers_mask_1) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", line 719, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", line 645, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.0%) x: array([ True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, True, True, False, True, True, False,... y: array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True,... ====================================================================== FAIL: Test they should converge to same coefficients for same parameters ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", line 136, in test_huber_and_sgd_same_results assert_almost_equal(huber.scale_, 1.0, 3) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 3 decimals ACTUAL: 3.6103567932800094e-11 DESIRED: 1.0 ``` I wonder why our results are different? Cheers, Matthew From g.lemaitre58 at gmail.com Wed Jul 20 13:31:34 2016 From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=) Date: Wed, 20 Jul 2016 19:31:34 +0200 Subject: [scikit-learn] Three new scikit-learn-contrib projects In-Reply-To: <20160720074843.GA609220@phare.normalesup.org> References: <20160720074843.GA609220@phare.normalesup.org> Message-ID: Hi Gael, I was wondering if you could elaborate on the problem of hyper-parameter tuning and why the imbalanced-learn would not benefit from it. Since that we used the identical pipeline of scikit-learn and add the part to handle the sampler, I would have think that we could use it. However this is true that I did not play to much with this part of the API, so I should probably missed something. Cheers, On 20 July 2016 at 09:48, Gael Varoquaux wrote: > Hey, > > These packages look great! I was interested in the imbalanced learning, > which is something that we stumbled upon: > > > * imbalanced-learn: > https://github.com/scikit-learn-contrib/imbalanced-learn > > > Python module to perform under sampling and over sampling with various > > techniques. > > Interestingly, the fit_sample method is related to the scikit-learn > enhancement proposal that we have tried to put together objects that can > modify the y in addition to the X: > https://github.com/scikit-learn/enhancement_proposals/pull/2 > > I think that this enhancement proposal of our API is important for two > reasons. The first one is that the corresponding objects cannot be put in > a pipeline (imbalanced-learn ends up having it's own pipeline), and hence > cannot benefit from hyper-parameter tuning on the full set of steps, or > cool things like DaskLearn. The second one is that different projects are > likely to come up with similar but incompatible solutions to this > problem, making it harder to combine things. > > Unfortunately, I haven't had time to push forward this proposal. But > comments on it (or a pull request to it) would be awesome. > > Cheers, > > Ga?l > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- *LEMA?TRE GuillaumePhD CandidateMSc Erasmus Mundus ViBOT (Vision-roBOTic)MSc Business Innovation and Technology Management* g.lemaitre58 at gmail.com *ViCOROB - Computer Vision and Robotic Team* Universitat de Girona, Campus Montilivi, Edifici P-IV 17071 Girona Tel. +34 972 41 98 12 - Fax. +34 972 41 82 59 http://vicorob.udg.es/ *LE2I - Le Creusot*IUT Le Creusot, Laboratoire LE2I, 12 rue de la Fonderie, 71200 Le Creusot Tel. +33 3 85 73 10 90 - Fax. +33 3 85 73 10 97 http://le2i.cnrs.fr https://sites.google.com/site/glemaitre58/ Vice - Chairman of A.S.C. Fours UFOLEP Chairman of A.S.C. Fours FFC Webmaster of http://ascfours.free.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From yenchenlin1994 at gmail.com Wed Jul 20 13:37:37 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 01:37:37 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: It's probably because I built it in a different way. Here is the steps for how I built it: 1. Type arch -32 /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python -c "import sys; print sys.maxint" and make sure it outputs 2147483647. 2. Modify line 5 of Makefile exists in root directory of scikit-learn become: PYTHON ?= arch -32 /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python and modify line 11 to: BITS := $(shell PYTHON -c 'import struct; print(8 * struct.calcsize("P"))') 3. Type sudo make in the root directory of scikit-learn to build a 32 bit version. It reports OK and no test failures after sudo make complete. BTW, then may you please run this branch and see if there are any errors relate to enet? Thanks a lot for helping me. Best, Yen-Chen 2016-07-21 0:55 GMT+08:00 Matthew Brett : > On Wed, Jul 20, 2016 at 5:25 PM, lin yenchen > wrote: > > Thanks for you guys' precious inputs. > > > > I've successfully built a 32-bit python version scikit-learn and check > it by > > printing `sys.maxint`, > > and all the tests passed on my mac. (I'm running the newest dev version > > though) > > On current master I get the following failures from: > > arch -i386 nosetests sklearn > > > ``` > ====================================================================== > ERROR: > sklearn.decomposition.tests.test_nmf.test_non_negative_factorization_checking > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", > line 342, in wrapper > return fn(*args, **kwargs) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/decomposition/tests/test_nmf.py", > line 237, in test_non_negative_factorization_checking > assert_no_warnings(nnmf, A, A, A, np.int64(1)) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", > line 272, in assert_no_warnings > result = func(*args, **kw) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/decomposition/nmf.py", > line 751, in non_negative_factorization > " got (n_components=%r)" % n_components) > ValueError: Number of components must be a positive integer; got > (n_components=1) > > ====================================================================== > ERROR: Test that it gives proper exception on deficient input. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/ensemble/tests/test_iforest.py", > line 107, in test_iforest_error > assert_no_warnings(IsolationForest(max_samples=np.int64(2)).fit, X) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/testing.py", > line 272, in assert_no_warnings > result = func(*args, **kw) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/ensemble/iforest.py", > line 182, in fit > raise ValueError("max_samples must be in (0, 1]") > ValueError: max_samples must be in (0, 1] > > ====================================================================== > ERROR: sklearn.linear_model.tests.test_huber.test_huber_better_r2_score > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", > line 170, in test_huber_better_r2_score > huber_score = huber.score(X[mask], y[mask]) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/base.py", line 363, in > score > return r2_score(y, self.predict(X), sample_weight=sample_weight, > File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/base.py", > line 268, in predict > return self._decision_function(X) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/base.py", > line 251, in _decision_function > X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/utils/validation.py", > line 415, in check_array > context)) > ValueError: Found array with 0 sample(s) (shape=(0, 20)) while a > minimum of 1 is required. > > ====================================================================== > ERROR: sklearn.tree.tests.test_tree.test_huge_allocations > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/tree/tests/test_tree.py", > line 1089, in test_huge_allocations > assert_raises(MemoryError, clf.fit, X, y) > File > "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", > line 473, in assertRaises > callableObj(*args, **kwargs) > File "/Users/mb312/dev_trees/scikit-learn/sklearn/tree/tree.py", > line 366, in fit > max_leaf_nodes) > File "sklearn/tree/_tree.pyx", line 292, in > sklearn.tree._tree.BestFirstTreeBuilder.__cinit__ > (sklearn/tree/_tree.c:4728) > SIZE_t max_depth, SIZE_t max_leaf_nodes): > OverflowError: Python int too large to convert to C long > > ====================================================================== > FAIL: Test that outliers filtering is scaling independent. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", > line 120, in test_huber_scaling_invariant > assert_array_equal(n_outliers_mask_3, n_outliers_mask_1) > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", > line 719, in assert_array_equal > verbose=verbose, header='Arrays are not equal') > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", > line 645, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not equal > > (mismatch 66.0%) > x: array([ True, False, False, True, False, False, False, False, False, > False, False, False, False, False, False, False, True, True, > False, False, False, True, True, False, True, True, False,... > y: array([ True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True,... > > ====================================================================== > FAIL: Test they should converge to same coefficients for same parameters > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/Users/mb312/dev_trees/scikit-learn/sklearn/linear_model/tests/test_huber.py", > line 136, in test_huber_and_sgd_same_results > assert_almost_equal(huber.scale_, 1.0, 3) > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/testing/utils.py", > line 468, in assert_almost_equal > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 3 decimals > ACTUAL: 3.6103567932800094e-11 > DESIRED: 1.0 > ``` > > I wonder why our results are different? > > Cheers, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 20 13:55:49 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 18:55:49 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Hi, On Wed, Jul 20, 2016 at 6:37 PM, lin yenchen wrote: > It's probably because I built it in a different way. > > Here is the steps for how I built it: > > Type arch -32 > /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python -c > "import sys; print sys.maxint" and make sure it outputs 2147483647. > > Modify line 5 of Makefile exists in root directory of scikit-learn become: > PYTHON ?= arch -32 > /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python > > and modify line 11 to: > BITS := $(shell PYTHON -c 'import struct; print(8 * struct.calcsize("P"))') > > Type sudo make in the root directory of scikit-learn to build a 32 bit > version. > > It reports OK and no test failures after sudo make complete. I think you are still testing the 64-bit code there. When I do: arch -32 python setup.py build then the build flags still ask for a dual arch build: clang -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 ... Later, when you run `nosetests` via the Makefile, it will load the default 64-bit Python to do the tests. Specifically, I predict that you'll get a test error if you force nosetests to use 32-bit Python: arch -32 nosetests sklearn.tree.tests.test_tree Cheers, Matthew From yenchenlin1994 at gmail.com Wed Jul 20 14:14:53 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 02:14:53 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Sorry I can't run arch -32 nosetests on my mac, but I execute the function test_huge_allocations existing in test_tree.py solely using 32bit python (checked by sys.maxint), it still works. It's pretty weird ... Best, Yen-Chen 2016-07-21 1:55 GMT+08:00 Matthew Brett : > Hi, > > On Wed, Jul 20, 2016 at 6:37 PM, lin yenchen > wrote: > > It's probably because I built it in a different way. > > > > Here is the steps for how I built it: > > > > Type arch -32 > > /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python -c > > "import sys; print sys.maxint" and make sure it outputs 2147483647. > > > > Modify line 5 of Makefile exists in root directory of scikit-learn > become: > > PYTHON ?= arch -32 > > /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python > > > > and modify line 11 to: > > BITS := $(shell PYTHON -c 'import struct; print(8 * > struct.calcsize("P"))') > > > > Type sudo make in the root directory of scikit-learn to build a 32 bit > > version. > > > > It reports OK and no test failures after sudo make complete. > > I think you are still testing the 64-bit code there. When I do: > > arch -32 python setup.py build > > then the build flags still ask for a dual arch build: > > clang -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 ... > > Later, when you run `nosetests` via the Makefile, it will load the > default 64-bit Python to do the tests. > > Specifically, I predict that you'll get a test error if you force > nosetests to use 32-bit Python: > > arch -32 nosetests sklearn.tree.tests.test_tree > > Cheers, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 20 14:24:05 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 19:24:05 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 7:14 PM, lin yenchen wrote: > Sorry I can't run arch -32 nosetests on my mac, > but I execute the function test_huge_allocations existing in test_tree.py > solely using 32bit python (checked by sys.maxint), > it still works. What error do you get for `arch -32 nosetests` ? Matthew From yenchenlin1994 at gmail.com Wed Jul 20 14:26:48 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 02:26:48 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: arch: posix_spawnp: nosetests: Bad CPU type in executable 2016-07-21 2:24 GMT+08:00 Matthew Brett : > On Wed, Jul 20, 2016 at 7:14 PM, lin yenchen > wrote: > > Sorry I can't run arch -32 nosetests on my mac, > > but I execute the function test_huge_allocations existing in test_tree.py > > solely using 32bit python (checked by sys.maxint), > > it still works. > > What error do you get for `arch -32 nosetests` ? > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 20 14:38:36 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jul 2016 19:38:36 +0100 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: On Wed, Jul 20, 2016 at 7:26 PM, lin yenchen wrote: > arch: posix_spawnp: nosetests: Bad CPU type in executable I think your nosetests is pointing to a Python executable other than Python.org Python. I only get that error when trying to execute homebrew Python, not Python.org Python or system Python. Maybe check what nosetests is using with head -1 $(which nosetests Best, Matthew From yenchenlin1994 at gmail.com Thu Jul 21 01:32:29 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Thu, 21 Jul 2016 13:32:29 +0800 Subject: [scikit-learn] How to test on PYTHON_ARCH=32 with mac? In-Reply-To: References: Message-ID: Thanks Matthew, you are right. I did not use the right Python. But sorry for temporarily not getting time to set up all python 32 bit nosetests stuff since my final goal is to address my CI issues, and I am currently using this feature provided by appveyor to debug. Anyway, Thanks a lot for your help! Maybe anyone here can help to verify if there are tests errors for 32bit scikit-learn on mac? 2016-07-21 2:38 GMT+08:00 Matthew Brett : > On Wed, Jul 20, 2016 at 7:26 PM, lin yenchen > wrote: > > arch: posix_spawnp: nosetests: Bad CPU type in executable > > I think your nosetests is pointing to a Python executable other than > Python.org Python. I only get that error when trying to execute > homebrew Python, not Python.org Python or system Python. Maybe check > what nosetests is using with > > head -1 $(which nosetests > > Best, > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rahul.ahuja at live.com Thu Jul 21 10:50:55 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 14:50:55 +0000 Subject: [scikit-learn] sklearn website down in my country Pakistan Message-ID: Hi there, Sklearn website has been down for couple of days. Please look into it. I reside in Pakistan, Karachi city. Kind regards, Rahul Ahuja -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Thu Jul 21 10:58:04 2016 From: nfliu at uw.edu (Nelson Liu) Date: Thu, 21 Jul 2016 14:58:04 +0000 Subject: [scikit-learn] sklearn website down in my country Pakistan In-Reply-To: References: Message-ID: Hi, If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the maintainers don't have control over downtime and issues like the one you're having). Can you connect to GitHub, or any site on GitHub Pages? Thanks Nelson On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > > Kind regards, > Rahul Ahuja > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From drraph at gmail.com Thu Jul 21 11:22:09 2016 From: drraph at gmail.com (Raphael C) Date: Thu, 21 Jul 2016 16:22:09 +0100 Subject: [scikit-learn] How to get the most important features from a RF efficiently Message-ID: I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features. I tried simply printing out forest.feature_importances_ but this takes about 1 second per feature making about 40,000 seconds overall. This is much much longer than the time needed to train the classifier in the first place? Is there a more efficient way to find out which features are most important? Raphael On 21 July 2016 at 15:58, Nelson Liu wrote: > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> >> Hi there, >> >> >> Sklearn website has been down for couple of days. Please look into it. >> >> >> I reside in Pakistan, Karachi city. >> >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > From rahul.ahuja at live.com Thu Jul 21 12:27:54 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 16:27:54 +0000 Subject: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 In-Reply-To: References: Message-ID: Yes I can open github pages. Kind regards, Rahul Ahuja ________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Thursday, July 21, 2016 9:00 PM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 4, Issue 31 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. sklearn website down in my country Pakistan (Rahul Ahuja) 2. Re: sklearn website down in my country Pakistan (Nelson Liu) 3. How to get the most important features from a RF efficiently (Raphael C) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 14:50:55 +0000 From: Rahul Ahuja To: "scikit-learn at python.org" Subject: [scikit-learn] sklearn website down in my country Pakistan Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi there, Sklearn website has been down for couple of days. Please look into it. I reside in Pakistan, Karachi city. Kind regards, Rahul Ahuja -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 21 Jul 2016 14:58:04 +0000 From: Nelson Liu To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] sklearn website down in my country Pakistan Message-ID: Content-Type: text/plain; charset="utf-8" Hi, If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the maintainers don't have control over downtime and issues like the one you're having). Can you connect to GitHub, or any site on GitHub Pages? Thanks Nelson On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > > Kind regards, > Rahul Ahuja > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Thu, 21 Jul 2016 16:22:09 +0100 From: Raphael C To: Scikit-learn user and developer mailing list Subject: [scikit-learn] How to get the most important features from a RF efficiently Message-ID: Content-Type: text/plain; charset=UTF-8 I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features. I tried simply printing out forest.feature_importances_ but this takes about 1 second per feature making about 40,000 seconds overall. This is much much longer than the time needed to train the classifier in the first place? Is there a more efficient way to find out which features are most important? Raphael On 21 July 2016 at 15:58, Nelson Liu wrote: > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> >> Hi there, >> >> >> Sklearn website has been down for couple of days. Please look into it. >> >> >> I reside in Pakistan, Karachi city. >> >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 31 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Jul 21 12:58:48 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 21 Jul 2016 12:58:48 -0400 Subject: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 In-Reply-To: References: Message-ID: Hm, the website works fine for me (and I also didn?t have any issues in the last few days). Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) - Alternatively, maybe try http://scikit-learn.org/stable/ - A different browser - clearing the browser cache Hope one of these things work! Best, Sebastian > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From rahul.ahuja at live.com Thu Jul 21 13:18:47 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 17:18:47 +0000 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: Hi there, Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? Kind regards, Rahul Ahuja ________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Thursday, July 21, 2016 9:59 PM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 4, Issue 32 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 16:27:54 +0000 From: Rahul Ahuja To: "scikit-learn at python.org" Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 Message-ID: Content-Type: text/plain; charset="iso-8859-1" Yes I can open github pages. Kind regards, Rahul Ahuja ________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Thursday, July 21, 2016 9:00 PM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 4, Issue 31 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. sklearn website down in my country Pakistan (Rahul Ahuja) 2. Re: sklearn website down in my country Pakistan (Nelson Liu) 3. How to get the most important features from a RF efficiently (Raphael C) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 14:50:55 +0000 From: Rahul Ahuja To: "scikit-learn at python.org" Subject: [scikit-learn] sklearn website down in my country Pakistan Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi there, Sklearn website has been down for couple of days. Please look into it. I reside in Pakistan, Karachi city. Kind regards, Rahul Ahuja -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 21 Jul 2016 14:58:04 +0000 From: Nelson Liu To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] sklearn website down in my country Pakistan Message-ID: Content-Type: text/plain; charset="utf-8" Hi, If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the maintainers don't have control over downtime and issues like the one you're having). Can you connect to GitHub, or any site on GitHub Pages? Thanks Nelson On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > > Kind regards, > Rahul Ahuja > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Thu, 21 Jul 2016 16:22:09 +0100 From: Raphael C To: Scikit-learn user and developer mailing list Subject: [scikit-learn] How to get the most important features from a RF efficiently Message-ID: Content-Type: text/plain; charset=UTF-8 I have a set of feature vectors associated with binary class labels, each of which has about 40,000 features. I can train a random forest classifier in sklearn which works well. I would however like to see the most important features. I tried simply printing out forest.feature_importances_ but this takes about 1 second per feature making about 40,000 seconds overall. This is much much longer than the time needed to train the classifier in the first place? Is there a more efficient way to find out which features are most important? Raphael On 21 July 2016 at 15:58, Nelson Liu wrote: > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> >> Hi there, >> >> >> Sklearn website has been down for couple of days. Please look into it. >> >> >> I reside in Pakistan, Karachi city. >> >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 31 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 21 Jul 2016 12:58:48 -0400 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 Message-ID: Content-Type: text/plain; charset=utf-8 Hm, the website works fine for me (and I also didn?t have any issues in the last few days). Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) - Alternatively, maybe try http://scikit-learn.org/stable/ - A different browser - clearing the browser cache Hope one of these things work! Best, Sebastian > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 32 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Jul 21 13:25:00 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 21 Jul 2016 13:25:00 -0400 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE@sebastianraschka.com> Hm, the problem persists if you call it directly via? http://scikit-learn.github.io > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > Hi there, > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:59 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 32 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 16:27:54 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > ________________________________ > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 12:58:48 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > - Alternatively, maybe try http://scikit-learn.org/stable/ > - A different browser > - clearing the browser cache > > Hope one of these things work! > > Best, > Sebastian > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 32 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From mail at sebastianraschka.com Thu Jul 21 13:32:34 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 21 Jul 2016 13:32:34 -0400 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: <186E2B4F-6EDD-427C-B37E-326C4402EF8F@sebastianraschka.com> Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from https://github.com/scikit-learn/scikit-learn.github.io and open it locally (via index.html) > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > Hi there, > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:59 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 32 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 16:27:54 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > ________________________________ > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 12:58:48 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > - Alternatively, maybe try http://scikit-learn.org/stable/ > - A different browser > - clearing the browser cache > > Hope one of these things work! > > Best, > Sebastian > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 32 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From rahul.ahuja at live.com Thu Jul 21 14:18:17 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 18:18:17 +0000 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: yes it does via that link as well. the name of the tab becomes Unicorn! Github Is there any way that it can be resolved? Kind regards, Rahul Ahuja ________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Thursday, July 21, 2016 10:39 PM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 4, Issue 34 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: Sklearn website is down in my place (Sebastian Raschka) 2. Re: Sklearn website is down in my place (Sebastian Raschka) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 13:25:00 -0400 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Sklearn website is down in my place Message-ID: <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> Content-Type: text/plain; charset=iso-8859-1 Hm, the problem persists if you call it directly via? http://scikit-learn.github.io > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > Hi there, > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:59 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 32 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 16:27:54 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > ________________________________ > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 12:58:48 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > - Alternatively, maybe try http://scikit-learn.org/stable/ > - A different browser > - clearing the browser cache > > Hope one of these things work! > > Best, > Sebastian > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 32 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ Message: 2 Date: Thu, 21 Jul 2016 13:32:34 -0400 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Sklearn website is down in my place Message-ID: <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> Content-Type: text/plain; charset=iso-8859-1 Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from https://github.com/scikit-learn/scikit-learn.github.io and open it locally (via index.html) > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > Hi there, > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:59 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 32 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 16:27:54 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > ________________________________ > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 12:58:48 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > - Alternatively, maybe try http://scikit-learn.org/stable/ > - A different browser > - clearing the browser cache > > Hope one of these things work! > > Best, > Sebastian > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 32 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 34 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadim.farhat at gmail.com Thu Jul 21 14:32:17 2016 From: nadim.farhat at gmail.com (Nadim Farhat) Date: Thu, 21 Jul 2016 18:32:17 +0000 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: Hi, 1-are you able to ping the website ? if yes, it might be a problem with your ISP DNS. if no ISP must have blocked the site. try to ask a friend with different server provider to access the website. 2- Try to clear the cache in your web browser though i don't expect this to work since you said you tried accessing with multiple devices. 3- if you must access the site you can use the cached version 4- if the ISP have blocked the site try to use a proxy. Bests Nadim On Thu, Jul 21, 2016 at 1:21 PM Rahul Ahuja wrote: > Hi there, > > > Sklearn is down in my place (location). I have tried to access with > multiple devices and internet connections but still can't. I can open > github websites though. Is there any way to access sklearn website? > > > > > > > Kind regards, > Rahul Ahuja > > > ------------------------------ > *From:* scikit-learn > on behalf of scikit-learn-request at python.org < > scikit-learn-request at python.org> > *Sent:* Thursday, July 21, 2016 9:59 PM > *To:* scikit-learn at python.org > *Subject:* scikit-learn Digest, Vol 4, Issue 32 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > > mail.python.org > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all the > list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 16:27:54 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > < > PS1PR01MB1017419626403709DD7B463998090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com > > > > Content-Type: text/plain; charset="iso-8859-1" > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > ________________________________ > From: scikit-learn > on behalf of scikit-learn-request at python.org < > scikit-learn-request at python.org> > Sent: Thursday, July 21, 2016 9:00 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 31 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python< > https://mail.python.org/mailman/listinfo/scikit-learn> > mail.python.org > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all the > list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > 3. How to get the most important features from a RF efficiently > (Raphael C) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 14:50:55 +0000 > From: Rahul Ahuja > To: "scikit-learn at python.org" > Subject: [scikit-learn] sklearn website down in my country Pakistan > Message-ID: > < > PS1PR01MB101761ECA4DDE87E85BB999F98090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com > > > > Content-Type: text/plain; charset="iso-8859-1" > > Hi there, > > > Sklearn website has been down for couple of days. Please look into it. > > > I reside in Pakistan, Karachi city. > > > > > > Kind regards, > Rahul Ahuja > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160721/39b64f34/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 14:58:04 +0000 > From: Nelson Liu > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] sklearn website down in my country > Pakistan > Message-ID: > 0TAa1A at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so > the > maintainers don't have control over downtime and issues like the one you're > having). Can you connect to GitHub, or any site on GitHub Pages? > > Thanks > Nelson > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160721/a12c5d84/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Thu, 21 Jul 2016 16:22:09 +0100 > From: Raphael C > To: Scikit-learn user and developer mailing list > > Subject: [scikit-learn] How to get the most important features from a > RF efficiently > Message-ID: > 7xTJuNLdR8jYt7npvCtUDs3aFYwW-UTw at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most > important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so > the > > maintainers don't have control over downtime and issues like the one > you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > >> > >> Hi there, > >> > >> > >> Sklearn website has been down for couple of days. Please look into it. > >> > >> > >> I reside in Pakistan, Karachi city. > >> > >> > >> > >> > >> > >> > >> Kind regards, > >> Rahul Ahuja > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn at python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 31 > ******************************************* > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160721/d2af07c7/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 12:58:48 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, the website works fine for me (and I also didn?t have any issues in > the last few days). > Just to make sure your are using the correct address, it should be > http://scikit-learn.org/ (maybe you used https://scikit-learn.org by > accident !?) > > - Alternatively, maybe try http://scikit-learn.org/stable/ > - A different browser > - clearing the browser cache > > Hope one of these things work! > > Best, > Sebastian > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn > on behalf of scikit-learn-request at python.org < > scikit-learn-request at python.org> > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all the > list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > < > PS1PR01MB101761ECA4DDE87E85BB999F98090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160721/39b64f34/attachment-0001.html > > > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > 0TAa1A at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so > the > > maintainers don't have control over downtime and issues like the one > you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > http://mail.python.org/pipermail/scikit-learn/attachments/20160721/a12c5d84/attachment-0001.html > > > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > 7xTJuNLdR8jYt7npvCtUDs3aFYwW-UTw at mail.gmail.com> > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most > important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages > (so the > > > maintainers don't have control over downtime and issues like the one > you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 32 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Nadim Farhat -------------- next part -------------- An HTML attachment was scrubbed... URL: From nfliu at uw.edu Thu Jul 21 15:04:40 2016 From: nfliu at uw.edu (Nelson Liu) Date: Thu, 21 Jul 2016 19:04:40 +0000 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: Alternatively, if none of those work, I'd be willing to put up a copy for you at my non-github hosted site. Separately email me if that's something you'd be interested in. On Thu, Jul 21, 2016, 11:36 Nadim Farhat wrote: > Hi, > > 1-are you able to ping the website ? > if yes, it might be a problem with your ISP DNS. if no ISP must have > blocked the site. try to ask a friend with different server provider to > access the website. > 2- Try to clear the cache in your web browser though i don't expect this > to work since you said you tried accessing with multiple devices. > 3- if you must access the site you can use the cached version > 4- if the ISP have blocked the site try to use a proxy. > > Bests > > Nadim > > On Thu, Jul 21, 2016 at 1:21 PM Rahul Ahuja wrote: > >> Hi there, >> >> >> Sklearn is down in my place (location). I have tried to access with >> multiple devices and internet connections but still can't. I can open >> github websites though. Is there any way to access sklearn website? >> >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> >> >> ------------------------------ >> *From:* scikit-learn > live.com at python.org> on behalf of scikit-learn-request at python.org < >> scikit-learn-request at python.org> >> *Sent:* Thursday, July 21, 2016 9:59 PM >> *To:* scikit-learn at python.org >> *Subject:* scikit-learn Digest, Vol 4, Issue 32 >> >> Send scikit-learn mailing list submissions to >> scikit-learn at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> scikit-learn Info Page - Python >> >> mail.python.org >> To see the collection of prior postings to the list, visit the >> scikit-learn Archives. Using scikit-learn: To post a message to all the >> list members ... >> >> >> or, via email, send a message with subject or body 'help' to >> scikit-learn-request at python.org >> >> You can reach the person managing the list at >> scikit-learn-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) >> 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 21 Jul 2016 16:27:54 +0000 >> From: Rahul Ahuja >> To: "scikit-learn at python.org" >> Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 >> Message-ID: >> < >> PS1PR01MB1017419626403709DD7B463998090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Yes I can open github pages. >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> >> >> ________________________________ >> From: scikit-learn >> on behalf of scikit-learn-request at python.org < >> scikit-learn-request at python.org> >> Sent: Thursday, July 21, 2016 9:00 PM >> To: scikit-learn at python.org >> Subject: scikit-learn Digest, Vol 4, Issue 31 >> >> Send scikit-learn mailing list submissions to >> scikit-learn at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> scikit-learn Info Page - Python< >> https://mail.python.org/mailman/listinfo/scikit-learn> >> mail.python.org >> To see the collection of prior postings to the list, visit the >> scikit-learn Archives. Using scikit-learn: To post a message to all the >> list members ... >> >> >> >> or, via email, send a message with subject or body 'help' to >> scikit-learn-request at python.org >> >> You can reach the person managing the list at >> scikit-learn-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. sklearn website down in my country Pakistan (Rahul Ahuja) >> 2. Re: sklearn website down in my country Pakistan (Nelson Liu) >> 3. How to get the most important features from a RF efficiently >> (Raphael C) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Thu, 21 Jul 2016 14:50:55 +0000 >> From: Rahul Ahuja >> To: "scikit-learn at python.org" >> Subject: [scikit-learn] sklearn website down in my country Pakistan >> Message-ID: >> < >> PS1PR01MB101761ECA4DDE87E85BB999F98090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com >> > >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi there, >> >> >> Sklearn website has been down for couple of days. Please look into it. >> >> >> I reside in Pakistan, Karachi city. >> >> >> >> >> >> Kind regards, >> Rahul Ahuja >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160721/39b64f34/attachment-0001.html >> > >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 21 Jul 2016 14:58:04 +0000 >> From: Nelson Liu >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] sklearn website down in my country >> Pakistan >> Message-ID: >> > 0TAa1A at mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> Hi, >> If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so >> the >> maintainers don't have control over downtime and issues like the one >> you're >> having). Can you connect to GitHub, or any site on GitHub Pages? >> >> Thanks >> Nelson >> >> On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> >> > Hi there, >> > >> > >> > Sklearn website has been down for couple of days. Please look into it. >> > >> > >> > I reside in Pakistan, Karachi city. >> > >> > >> > >> > >> > >> > >> > Kind regards, >> > Rahul Ahuja >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160721/a12c5d84/attachment-0001.html >> > >> >> ------------------------------ >> >> Message: 3 >> Date: Thu, 21 Jul 2016 16:22:09 +0100 >> From: Raphael C >> To: Scikit-learn user and developer mailing list >> >> Subject: [scikit-learn] How to get the most important features from a >> RF efficiently >> Message-ID: >> > 7xTJuNLdR8jYt7npvCtUDs3aFYwW-UTw at mail.gmail.com> >> Content-Type: text/plain; charset=UTF-8 >> >> I have a set of feature vectors associated with binary class labels, >> each of which has about 40,000 features. I can train a random forest >> classifier in sklearn which works well. I would however like to see >> the most important features. >> >> I tried simply printing out forest.feature_importances_ but this takes >> about 1 second per feature making about 40,000 seconds overall. This >> is much much longer than the time needed to train the classifier in >> the first place? >> >> Is there a more efficient way to find out which features are most >> important? >> >> Raphael >> >> On 21 July 2016 at 15:58, Nelson Liu wrote: >> > Hi, >> > If I remember correctly, scikit-learn.org is hosted on GitHub Pages >> (so the >> > maintainers don't have control over downtime and issues like the one >> you're >> > having). Can you connect to GitHub, or any site on GitHub Pages? >> > >> > Thanks >> > Nelson >> > >> > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> >> >> >> Hi there, >> >> >> >> >> >> Sklearn website has been down for couple of days. Please look into it. >> >> >> >> >> >> I reside in Pakistan, Karachi city. >> >> >> >> >> >> >> >> >> >> >> >> >> >> Kind regards, >> >> Rahul Ahuja >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn at python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 4, Issue 31 >> ******************************************* >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160721/d2af07c7/attachment-0001.html >> > >> >> ------------------------------ >> >> Message: 2 >> Date: Thu, 21 Jul 2016 12:58:48 -0400 >> From: Sebastian Raschka >> To: Scikit-learn user and developer mailing list >> >> Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 >> Message-ID: >> >> Content-Type: text/plain; charset=utf-8 >> >> Hm, the website works fine for me (and I also didn?t have any issues in >> the last few days). >> Just to make sure your are using the correct address, it should be >> http://scikit-learn.org/ (maybe you used https://scikit-learn.org by >> accident !?) >> >> - Alternatively, maybe try http://scikit-learn.org/stable/ >> - A different browser >> - clearing the browser cache >> >> Hope one of these things work! >> >> Best, >> Sebastian >> >> >> > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: >> > >> > Yes I can open github pages. >> > >> > >> > >> > >> > >> > Kind regards, >> > Rahul Ahuja >> > >> > >> > From: scikit-learn > live.com at python.org> on behalf of scikit-learn-request at python.org < >> scikit-learn-request at python.org> >> > Sent: Thursday, July 21, 2016 9:00 PM >> > To: scikit-learn at python.org >> > Subject: scikit-learn Digest, Vol 4, Issue 31 >> > >> > Send scikit-learn mailing list submissions to >> > scikit-learn at python.org >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > scikit-learn Info Page - Python >> > mail.python.org >> > To see the collection of prior postings to the list, visit the >> scikit-learn Archives. Using scikit-learn: To post a message to all the >> list members ... >> > >> > >> > or, via email, send a message with subject or body 'help' to >> > scikit-learn-request at python.org >> > >> > You can reach the person managing the list at >> > scikit-learn-owner at python.org >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of scikit-learn digest..." >> > >> > >> > Today's Topics: >> > >> > 1. sklearn website down in my country Pakistan (Rahul Ahuja) >> > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) >> > 3. How to get the most important features from a RF efficiently >> > (Raphael C) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Thu, 21 Jul 2016 14:50:55 +0000 >> > From: Rahul Ahuja >> > To: "scikit-learn at python.org" >> > Subject: [scikit-learn] sklearn website down in my country Pakistan >> > Message-ID: >> > < >> PS1PR01MB101761ECA4DDE87E85BB999F98090 at PS1PR01MB1017.apcprd01.prod.exchangelabs.com >> > >> > >> > Content-Type: text/plain; charset="iso-8859-1" >> > >> > Hi there, >> > >> > >> > Sklearn website has been down for couple of days. Please look into it. >> > >> > >> > I reside in Pakistan, Karachi city. >> > >> > >> > >> > >> > >> > Kind regards, >> > Rahul Ahuja >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160721/39b64f34/attachment-0001.html >> > >> > >> > ------------------------------ >> > >> > Message: 2 >> > Date: Thu, 21 Jul 2016 14:58:04 +0000 >> > From: Nelson Liu >> > To: Scikit-learn user and developer mailing list >> > >> > Subject: Re: [scikit-learn] sklearn website down in my country >> > Pakistan >> > Message-ID: >> > > 0TAa1A at mail.gmail.com> >> > Content-Type: text/plain; charset="utf-8" >> > >> > Hi, >> > If I remember correctly, scikit-learn.org is hosted on GitHub Pages >> (so the >> > maintainers don't have control over downtime and issues like the one >> you're >> > having). Can you connect to GitHub, or any site on GitHub Pages? >> > >> > Thanks >> > Nelson >> > >> > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> > >> > > Hi there, >> > > >> > > >> > > Sklearn website has been down for couple of days. Please look into it. >> > > >> > > >> > > I reside in Pakistan, Karachi city. >> > > >> > > >> > > >> > > >> > > >> > > >> > > Kind regards, >> > > Rahul Ahuja >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20160721/a12c5d84/attachment-0001.html >> > >> > >> > ------------------------------ >> > >> > Message: 3 >> > Date: Thu, 21 Jul 2016 16:22:09 +0100 >> > From: Raphael C >> > To: Scikit-learn user and developer mailing list >> > >> > Subject: [scikit-learn] How to get the most important features from a >> > RF efficiently >> > Message-ID: >> > > 7xTJuNLdR8jYt7npvCtUDs3aFYwW-UTw at mail.gmail.com> >> > Content-Type: text/plain; charset=UTF-8 >> > >> > I have a set of feature vectors associated with binary class labels, >> > each of which has about 40,000 features. I can train a random forest >> > classifier in sklearn which works well. I would however like to see >> > the most important features. >> > >> > I tried simply printing out forest.feature_importances_ but this takes >> > about 1 second per feature making about 40,000 seconds overall. This >> > is much much longer than the time needed to train the classifier in >> > the first place? >> > >> > Is there a more efficient way to find out which features are most >> important? >> > >> > Raphael >> > >> > On 21 July 2016 at 15:58, Nelson Liu wrote: >> > > Hi, >> > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages >> (so the >> > > maintainers don't have control over downtime and issues like the one >> you're >> > > having). Can you connect to GitHub, or any site on GitHub Pages? >> > > >> > > Thanks >> > > Nelson >> > > >> > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >> > >> >> > >> Hi there, >> > >> >> > >> >> > >> Sklearn website has been down for couple of days. Please look into >> it. >> > >> >> > >> >> > >> I reside in Pakistan, Karachi city. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> Kind regards, >> > >> Rahul Ahuja >> > >> _______________________________________________ >> > >> scikit-learn mailing list >> > >> scikit-learn at python.org >> > >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn at python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > >> > >> > ------------------------------ >> > >> > Subject: Digest Footer >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > ------------------------------ >> > >> > End of scikit-learn Digest, Vol 4, Issue 31 >> > ******************************************* >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn at python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 4, Issue 32 >> ******************************************* >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > -- > Nadim Farhat > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Jul 21 15:08:58 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 21 Jul 2016 15:08:58 -0400 Subject: [scikit-learn] Sklearn website is down in my place In-Reply-To: References: Message-ID: Hm, typically the unicorn indicates that there?s a GitHub-related issue; however, it still works for me, which is weird. Intuitively, I would say that it may have something to do with a cached version of your browser, yet you mentioned that it also doesn?t work on other devices either ? Hm, sounds tricky ? Another thing you could try is visiting the site via a proxy. E.g., try to go to https://hide.me/en/proxy and type "scikit-learn.org? into the form field. Best, Sebastian > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > yes it does via that link as well. the name of the tab becomes Unicorn! Github > > Is there any way that it can be resolved? > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 10:39 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 34 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > 2. Re: Sklearn website is down in my place (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 13:25:00 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> > Content-Type: text/plain; charset=iso-8859-1 > > Hm, the problem persists if you call it directly via? > > http://scikit-learn.github.io > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > Hi there, > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:59 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > ________________________________ > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > Content-Type: text/plain; charset=utf-8 > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > - A different browser > > - clearing the browser cache > > > > Hope one of these things work! > > > > Best, > > Sebastian > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 13:32:34 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> > Content-Type: text/plain; charset=iso-8859-1 > > Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from > > https://github.com/scikit-learn/scikit-learn.github.io > > and open it locally (via index.html) > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > Hi there, > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:59 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > ________________________________ > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > Content-Type: text/plain; charset=utf-8 > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > - A different browser > > - clearing the browser cache > > > > Hope one of these things work! > > > > Best, > > Sebastian > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 34 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From rahul.ahuja at live.com Thu Jul 21 15:29:21 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 19:29:21 +0000 Subject: [scikit-learn] scikit-learn.org not opening In-Reply-To: References: Message-ID: hi sebastian, proxy works for me,thanks. but it may not be permanent solution? Kind regards, Rahul Ahuja ________________________________ From: scikit-learn on behalf of scikit-learn-request at python.org Sent: Friday, July 22, 2016 12:20 AM To: scikit-learn at python.org Subject: scikit-learn Digest, Vol 4, Issue 38 Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: Sklearn website is down in my place (Sebastian Raschka) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 15:08:58 -0400 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Sklearn website is down in my place Message-ID: Content-Type: text/plain; charset=utf-8 Hm, typically the unicorn indicates that there?s a GitHub-related issue; however, it still works for me, which is weird. Intuitively, I would say that it may have something to do with a cached version of your browser, yet you mentioned that it also doesn?t work on other devices either ? Hm, sounds tricky ? Another thing you could try is visiting the site via a proxy. E.g., try to go to https://hide.me/en/proxy and type "scikit-learn.org? into the form field. Best, Sebastian > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > yes it does via that link as well. the name of the tab becomes Unicorn! Github > > Is there any way that it can be resolved? > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Thursday, July 21, 2016 10:39 PM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 34 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > 2. Re: Sklearn website is down in my place (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 13:25:00 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> > Content-Type: text/plain; charset=iso-8859-1 > > Hm, the problem persists if you call it directly via? > > http://scikit-learn.github.io > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > Hi there, > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:59 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > ________________________________ > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > Content-Type: text/plain; charset=utf-8 > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > - A different browser > > - clearing the browser cache > > > > Hope one of these things work! > > > > Best, > > Sebastian > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Message: 2 > Date: Thu, 21 Jul 2016 13:32:34 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> > Content-Type: text/plain; charset=iso-8859-1 > > Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from > > https://github.com/scikit-learn/scikit-learn.github.io > > and open it locally (via index.html) > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > Hi there, > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:59 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Yes I can open github pages. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > ________________________________ > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 9:00 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > 3. How to get the most important features from a RF efficiently > > (Raphael C) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > From: Rahul Ahuja > > To: "scikit-learn at python.org" > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi there, > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > Kind regards, > > Rahul Ahuja > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > From: Nelson Liu > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > Pakistan > > Message-ID: > > > > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > maintainers don't have control over downtime and issues like the one you're > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > Thanks > > Nelson > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 3 > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > From: Raphael C > > To: Scikit-learn user and developer mailing list > > > > Subject: [scikit-learn] How to get the most important features from a > > RF efficiently > > Message-ID: > > > > Content-Type: text/plain; charset=UTF-8 > > > > I have a set of feature vectors associated with binary class labels, > > each of which has about 40,000 features. I can train a random forest > > classifier in sklearn which works well. I would however like to see > > the most important features. > > > > I tried simply printing out forest.feature_importances_ but this takes > > about 1 second per feature making about 40,000 seconds overall. This > > is much much longer than the time needed to train the classifier in > > the first place? > > > > Is there a more efficient way to find out which features are most important? > > > > Raphael > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > >> > > >> Hi there, > > >> > > >> > > >> Sklearn website has been down for couple of days. Please look into it. > > >> > > >> > > >> I reside in Pakistan, Karachi city. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> Rahul Ahuja > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn at python.org > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > ******************************************* > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > Message-ID: > > > > Content-Type: text/plain; charset=utf-8 > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > - A different browser > > - clearing the browser cache > > > > Hope one of these things work! > > > > Best, > > Sebastian > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 34 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 38 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Thu Jul 21 15:37:48 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 21 Jul 2016 15:37:48 -0400 Subject: [scikit-learn] scikit-learn.org not opening In-Reply-To: References: Message-ID: <8C0E4C71-8445-4BD6-ABF6-1431C6E19E42@sebastianraschka.com> Glad to hear that it works at least. > but it may not be permanent solution? Yeah, that?s probably not ideal, and I am not sure if there?s a better solution if your country?s government prohibits the use of github :(. > On Jul 21, 2016, at 3:29 PM, Rahul Ahuja wrote: > > hi sebastian, > > proxy works for me,thanks. > but it may not be permanent solution? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Friday, July 22, 2016 12:20 AM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 38 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 15:08:58 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, typically the unicorn indicates that there?s a GitHub-related issue; however, it still works for me, which is weird. Intuitively, I would say that it may have something to do with a cached version of your browser, yet you mentioned that it also doesn?t work on other devices either ? Hm, sounds tricky ? Another thing you could try is visiting the site via a proxy. E.g., try to go to > > https://hide.me/en/proxy > > and type "scikit-learn.org? into the form field. > > Best, > Sebastian > > > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > > > > > yes it does via that link as well. the name of the tab becomes Unicorn! Github > > > > Is there any way that it can be resolved? > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 10:39 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 34 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > 2. Re: Sklearn website is down in my place (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 13:25:00 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, the problem persists if you call it directly via? > > > > http://scikit-learn.github.io > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > > maintainers don't have control over downtime and issues like the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 13:32:34 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from > > > > https://github.com/scikit-learn/scikit-learn.github.io > > > > and open it locally (via index.html) > > > > > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > > maintainers don't have control over downtime and issues like the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 34 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 38 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From rahul.ahuja at live.com Thu Jul 21 16:14:22 2016 From: rahul.ahuja at live.com (Rahul Ahuja) Date: Thu, 21 Jul 2016 20:14:22 +0000 Subject: [scikit-learn] git hub is working In-Reply-To: References: Message-ID: well github is working. I will get back to you on this sklearn topic. Get Outlook for Android On Fri, Jul 22, 2016 at 12:43 AM +0500, "scikit-learn-request at python.org" > wrote: Send scikit-learn mailing list submissions to scikit-learn at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-request at python.org You can reach the person managing the list at scikit-learn-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Re: scikit-learn.org not opening (Sebastian Raschka) ---------------------------------------------------------------------- Message: 1 Date: Thu, 21 Jul 2016 15:37:48 -0400 From: Sebastian Raschka To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] scikit-learn.org not opening Message-ID: <8C0E4C71-8445-4BD6-ABF6-1431C6E19E42 at sebastianraschka.com> Content-Type: text/plain; charset=utf-8 Glad to hear that it works at least. > but it may not be permanent solution? Yeah, that?s probably not ideal, and I am not sure if there?s a better solution if your country?s government prohibits the use of github :(. > On Jul 21, 2016, at 3:29 PM, Rahul Ahuja wrote: > > hi sebastian, > > proxy works for me,thanks. > but it may not be permanent solution? > > > > > > Kind regards, > Rahul Ahuja > > > From: scikit-learn on behalf of scikit-learn-request at python.org > Sent: Friday, July 22, 2016 12:20 AM > To: scikit-learn at python.org > Subject: scikit-learn Digest, Vol 4, Issue 38 > > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > scikit-learn Info Page - Python > mail.python.org > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 15:08:58 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, typically the unicorn indicates that there?s a GitHub-related issue; however, it still works for me, which is weird. Intuitively, I would say that it may have something to do with a cached version of your browser, yet you mentioned that it also doesn?t work on other devices either ? Hm, sounds tricky ? Another thing you could try is visiting the site via a proxy. E.g., try to go to > > https://hide.me/en/proxy > > and type "scikit-learn.org? into the form field. > > Best, > Sebastian > > > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > > > > > yes it does via that link as well. the name of the tab becomes Unicorn! Github > > > > Is there any way that it can be resolved? > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 10:39 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 34 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > 2. Re: Sklearn website is down in my place (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 13:25:00 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, the problem persists if you call it directly via? > > > > http://scikit-learn.github.io > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > > maintainers don't have control over downtime and issues like the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 13:32:34 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, just read that this may be yet another weird censorhip regulation; I think your best option would be to download the scikit-learn website, from > > > > https://github.com/scikit-learn/scikit-learn.github.io > > > > and open it locally (via index.html) > > > > > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access with multiple devices and internet connections but still can't. I can open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > maintainers don't have control over downtime and issues like the one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any issues in the last few days). > > > Just to make sure your are using the correct address, it should be http://scikit-learn.org/ (maybe you used https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn on behalf of scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > maintainers don't have control over downtime and issues like the one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the > > > > > maintainers don't have control over downtime and issues like the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 34 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 38 > ******************************************* > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 4, Issue 40 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at cgl.ucsf.edu Thu Jul 21 16:17:55 2016 From: ross at cgl.ucsf.edu (Bill Ross) Date: Thu, 21 Jul 2016 13:17:55 -0700 Subject: [scikit-learn] scikit-learn.org not opening In-Reply-To: References: Message-ID: <7fb06fd8-72b5-2927-9248-171a76843eac@cgl.ucsf.edu> If your block was political/moral in motivation, social change may be required for a more permanent solution. Bill On 7/21/16 12:29 PM, Rahul Ahuja wrote: > > hi sebastian, > > > proxy works for me,thanks. > > but it may not be permanent solution? > > > > > > > Kind regards, > Rahul Ahuja > > > ------------------------------------------------------------------------ > *From:* scikit-learn > on behalf of > scikit-learn-request at python.org > *Sent:* Friday, July 22, 2016 12:20 AM > *To:* scikit-learn at python.org > *Subject:* scikit-learn Digest, Vol 4, Issue 38 > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 21 Jul 2016 15:08:58 -0400 > From: Sebastian Raschka > To: Scikit-learn user and developer mailing list > > Subject: Re: [scikit-learn] Sklearn website is down in my place > Message-ID: > > Content-Type: text/plain; charset=utf-8 > > Hm, typically the unicorn indicates that there?s a GitHub-related > issue; however, it still works for me, which is weird. Intuitively, I > would say that it may have something to do with a cached version of > your browser, yet you mentioned that it also doesn?t work on other > devices either ? Hm, sounds tricky ? Another thing you could try is > visiting the site via a proxy. E.g., try to go to > > https://hide.me/en/proxy > > and type "scikit-learn.org? into the form field. > > Best, > Sebastian > > > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > > > > > yes it does via that link as well. the name of the tab becomes > Unicorn! Github > > > > Is there any way that it can be resolved? > > > > > > > > Kind regards, > > Rahul Ahuja > > > > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > Sent: Thursday, July 21, 2016 10:39 PM > > To: scikit-learn at python.org > > Subject: scikit-learn Digest, Vol 4, Issue 34 > > > > Send scikit-learn mailing list submissions to > > scikit-learn at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > scikit-learn Info Page - Python > > mail.python.org > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > or, via email, send a message with subject or body 'help' to > > scikit-learn-request at python.org > > > > You can reach the person managing the list at > > scikit-learn-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > 1. Re: Sklearn website is down in my place (Sebastian Raschka) > > 2. Re: Sklearn website is down in my place (Sebastian Raschka) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 21 Jul 2016 13:25:00 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <0A1A773E-9A48-4A12-ACD2-0E3F25FA21BE at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, the problem persists if you call it directly via? > > > > http://scikit-learn.github.io > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access > with multiple devices and internet connections but still can't. I can > open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - > Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > maintainers don't have control over downtime and issues like the > one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most > important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > maintainers don't have control over downtime and issues like the > one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look > into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any > issues in the last few days). > > > Just to make sure your are using the correct address, it should be > http://scikit-learn.org/ (maybe you used > https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja > wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > maintainers don't have control over downtime and issues like the > one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features > from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this > takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are > most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > > maintainers don't have control over downtime and issues like > the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look > into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 21 Jul 2016 13:32:34 -0400 > > From: Sebastian Raschka > > To: Scikit-learn user and developer mailing list > > > > Subject: Re: [scikit-learn] Sklearn website is down in my place > > Message-ID: > > <186E2B4F-6EDD-427C-B37E-326C4402EF8F at sebastianraschka.com> > > Content-Type: text/plain; charset=iso-8859-1 > > > > Hm, just read that this may be yet another weird censorhip > regulation; I think your best option would be to download the > scikit-learn website, from > > > > https://github.com/scikit-learn/scikit-learn.github.io > > > > and open it locally (via index.html) > > > > > > > > > On Jul 21, 2016, at 1:18 PM, Rahul Ahuja wrote: > > > > > > Hi there, > > > > > > Sklearn is down in my place (location). I have tried to access > with multiple devices and internet connections but still can't. I can > open github websites though. Is there any way to access sklearn website? > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:59 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 32 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) > > > 2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 16:27:54 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > > > > > > > ________________________________ > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > Sent: Thursday, July 21, 2016 9:00 PM > > > To: scikit-learn at python.org > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > Send scikit-learn mailing list submissions to > > > scikit-learn at python.org > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > scikit-learn Info Page - > Python > > > mail.python.org > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-request at python.org > > > > > > You can reach the person managing the list at > > > scikit-learn-owner at python.org > > > > > > When replying, please edit your Subject line so it is more specific > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > Today's Topics: > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > 3. How to get the most important features from a RF efficiently > > > (Raphael C) > > > > > > > > > ---------------------------------------------------------------------- > > > > > > Message: 1 > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > From: Rahul Ahuja > > > To: "scikit-learn at python.org" > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > Message-ID: > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Hi there, > > > > > > > > > Sklearn website has been down for couple of days. Please look into it. > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > Kind regards, > > > Rahul Ahuja > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > From: Nelson Liu > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > Pakistan > > > Message-ID: > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > Hi, > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > maintainers don't have control over downtime and issues like the > one you're > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > Thanks > > > Nelson > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 3 > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > From: Raphael C > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: [scikit-learn] How to get the most important features from a > > > RF efficiently > > > Message-ID: > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > I have a set of feature vectors associated with binary class labels, > > > each of which has about 40,000 features. I can train a random forest > > > classifier in sklearn which works well. I would however like to see > > > the most important features. > > > > > > I tried simply printing out forest.feature_importances_ but this takes > > > about 1 second per feature making about 40,000 seconds overall. This > > > is much much longer than the time needed to train the classifier in > > > the first place? > > > > > > Is there a more efficient way to find out which features are most > important? > > > > > > Raphael > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > maintainers don't have control over downtime and issues like the > one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > >> > > > >> Hi there, > > > >> > > > >> > > > >> Sklearn website has been down for couple of days. Please look > into it. > > > >> > > > >> > > > >> I reside in Pakistan, Karachi city. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> Kind regards, > > > >> Rahul Ahuja > > > >> _______________________________________________ > > > >> scikit-learn mailing list > > > >> scikit-learn at python.org > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > ******************************************* > > > -------------- next part -------------- > > > An HTML attachment was scrubbed... > > > URL: > > > > > > > ------------------------------ > > > > > > Message: 2 > > > Date: Thu, 21 Jul 2016 12:58:48 -0400 > > > From: Sebastian Raschka > > > To: Scikit-learn user and developer mailing list > > > > > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31 > > > Message-ID: > > > > > > Content-Type: text/plain; charset=utf-8 > > > > > > Hm, the website works fine for me (and I also didn?t have any > issues in the last few days). > > > Just to make sure your are using the correct address, it should be > http://scikit-learn.org/ (maybe you used > https://scikit-learn.org by accident !?) > > > > > > - Alternatively, maybe try http://scikit-learn.org/stable/ > > > - A different browser > > > - clearing the browser cache > > > > > > Hope one of these things work! > > > > > > Best, > > > Sebastian > > > > > > > > > > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja > wrote: > > > > > > > > Yes I can open github pages. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > > > > > > > > > From: scikit-learn > on behalf of > scikit-learn-request at python.org > > > > Sent: Thursday, July 21, 2016 9:00 PM > > > > To: scikit-learn at python.org > > > > Subject: scikit-learn Digest, Vol 4, Issue 31 > > > > > > > > Send scikit-learn mailing list submissions to > > > > scikit-learn at python.org > > > > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > scikit-learn Info Page - Python > > > > mail.python.org > > > > To see the collection of prior postings to the list, visit the > scikit-learn Archives. Using scikit-learn: To post a message to all > the list members ... > > > > > > > > > > > > or, via email, send a message with subject or body 'help' to > > > > scikit-learn-request at python.org > > > > > > > > You can reach the person managing the list at > > > > scikit-learn-owner at python.org > > > > > > > > When replying, please edit your Subject line so it is more specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > > > > > > > Today's Topics: > > > > > > > > 1. sklearn website down in my country Pakistan (Rahul Ahuja) > > > > 2. Re: sklearn website down in my country Pakistan (Nelson Liu) > > > > 3. How to get the most important features from a RF efficiently > > > > (Raphael C) > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Message: 1 > > > > Date: Thu, 21 Jul 2016 14:50:55 +0000 > > > > From: Rahul Ahuja > > > > To: "scikit-learn at python.org" > > > > Subject: [scikit-learn] sklearn website down in my country Pakistan > > > > Message-ID: > > > > > > > > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > > > Hi there, > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > Rahul Ahuja > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > > ------------------------------ > > > > > > > > Message: 2 > > > > Date: Thu, 21 Jul 2016 14:58:04 +0000 > > > > From: Nelson Liu > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: Re: [scikit-learn] sklearn website down in my country > > > > Pakistan > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset="utf-8" > > > > > > > > Hi, > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > maintainers don't have control over downtime and issues like the > one you're > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > Thanks > > > > Nelson > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > > > > > > > Hi there, > > > > > > > > > > > > > > > Sklearn website has been down for couple of days. Please look > into it. > > > > > > > > > > > > > > > I reside in Pakistan, Karachi city. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kind regards, > > > > > Rahul Ahuja > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: > > > > > > > > > ------------------------------ > > > > > > > > Message: 3 > > > > Date: Thu, 21 Jul 2016 16:22:09 +0100 > > > > From: Raphael C > > > > To: Scikit-learn user and developer mailing list > > > > > > > > Subject: [scikit-learn] How to get the most important features > from a > > > > RF efficiently > > > > Message-ID: > > > > > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > I have a set of feature vectors associated with binary class labels, > > > > each of which has about 40,000 features. I can train a random forest > > > > classifier in sklearn which works well. I would however like to see > > > > the most important features. > > > > > > > > I tried simply printing out forest.feature_importances_ but this > takes > > > > about 1 second per feature making about 40,000 seconds overall. This > > > > is much much longer than the time needed to train the classifier in > > > > the first place? > > > > > > > > Is there a more efficient way to find out which features are > most important? > > > > > > > > Raphael > > > > > > > > On 21 July 2016 at 15:58, Nelson Liu wrote: > > > > > Hi, > > > > > If I remember correctly, scikit-learn.org is hosted on GitHub > Pages (so the > > > > > maintainers don't have control over downtime and issues like > the one you're > > > > > having). Can you connect to GitHub, or any site on GitHub Pages? > > > > > > > > > > Thanks > > > > > Nelson > > > > > > > > > > On Thu, Jul 21, 2016, 07:52 Rahul Ahuja > wrote: > > > > >> > > > > >> Hi there, > > > > >> > > > > >> > > > > >> Sklearn website has been down for couple of days. Please look > into it. > > > > >> > > > > >> > > > > >> I reside in Pakistan, Karachi city. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> Kind regards, > > > > >> Rahul Ahuja > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn at python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn at python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Subject: Digest Footer > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > > > End of scikit-learn Digest, Vol 4, Issue 31 > > > > ******************************************* > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn at python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > ------------------------------ > > > > > > Subject: Digest Footer > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > ------------------------------ > > > > > > End of scikit-learn Digest, Vol 4, Issue 32 > > > ******************************************* > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn at python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 4, Issue 34 > > ******************************************* > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 38 > ******************************************* > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From drraph at gmail.com Thu Jul 21 17:29:26 2016 From: drraph at gmail.com (Raphael C) Date: Thu, 21 Jul 2016 22:29:26 +0100 Subject: [scikit-learn] How to get the most important features from a RF efficiently In-Reply-To: References: Message-ID: The problem was that I had a loop like for i in xrange(len(clf.feature_importances_)): print clf.feature_importances_[i] which recomputes the feature importance array in every step. Obvious in hindsight. Raphael On 21 July 2016 at 16:22, Raphael C wrote: > I have a set of feature vectors associated with binary class labels, > each of which has about 40,000 features. I can train a random forest > classifier in sklearn which works well. I would however like to see > the most important features. > > I tried simply printing out forest.feature_importances_ but this takes > about 1 second per feature making about 40,000 seconds overall. This > is much much longer than the time needed to train the classifier in > the first place? > > Is there a more efficient way to find out which features are most important? > > Raphael > > On 21 July 2016 at 15:58, Nelson Liu wrote: >> Hi, >> If I remember correctly, scikit-learn.org is hosted on GitHub Pages (so the >> maintainers don't have control over downtime and issues like the one you're >> having). Can you connect to GitHub, or any site on GitHub Pages? >> >> Thanks >> Nelson >> >> On Thu, Jul 21, 2016, 07:52 Rahul Ahuja wrote: >>> >>> Hi there, >>> >>> >>> Sklearn website has been down for couple of days. Please look into it. >>> >>> >>> I reside in Pakistan, Karachi city. >>> >>> >>> >>> >>> >>> >>> Kind regards, >>> Rahul Ahuja >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> From kastnerkyle at gmail.com Fri Jul 22 14:20:31 2016 From: kastnerkyle at gmail.com (Kyle Kastner) Date: Fri, 22 Jul 2016 14:20:31 -0400 Subject: [scikit-learn] [Scikit-learn-general] How to run scikit-learn test individually. In-Reply-To: References: Message-ID: you can call nosetests with a script name or even a function name e.g. nosetests test_init.py:test_import_skl will test the test_import_skl function inside test_init.py nosetests test_init.py would run all the tests in that file. On Fri, Jul 22, 2016 at 1:59 PM, Kyle Kastner wrote: > you can call nosetests with a script name or even a function name e.g. > > nosetests test_init.py:test_import_skl > > will test the test_import_skl function inside test_init.py > > nosetests test_init.py > > would run all the tests in that file. > > On Fri, Jul 22, 2016 at 11:33 AM, Boxiang Sun wrote: >> Hi all. >> >> I posted this question in IRC once, but maybe due to time zone or other >> reason. I didn't got answer yet(maybe I missed). >> >> I am trying to let scikit-learn can work in Pyston. I already finished >> NumPy, SciPy support in Pyston. >> >> When I try to run scikit-learn test suite by `python -c "import nose; >> nose.main()" -v sklearn`. Most of were fine. But I encountered one segfault, >> the back trace said it is in `sklearn/metrics/tests/test_pairwise.py`. >> >> But if I try to run that script individually by `python >> scikit-learn/sklearn/metrics/tests/test_pairwise.py`, like what I did when >> try to support NumPy and SciPy. That command didn't output anything, no >> segfault, no output. Try to use this way to run other test files will get >> same empty output. >> >> So how to run the sk-learn test individually and reproduce errors? Not by >> run the whole test suite. >> >> Regards, >> Sun >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning >> reports.http://sdm.link/zohodev2dev >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> From fengyanghe at gmail.com Sat Jul 23 04:38:28 2016 From: fengyanghe at gmail.com (fengyanghe) Date: Sat, 23 Jul 2016 16:38:28 +0800 Subject: [scikit-learn] about svdd model Message-ID: Hi guys: I'm trying to solve an one class classification problem. It seems to be not very good by using one class svm of scikit-learning. I want to try support vector data description(svdd) mothed. I'm wondering where I can get the code of the svdd. thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From albertthomas88 at gmail.com Sat Jul 23 05:54:32 2016 From: albertthomas88 at gmail.com (Albert Thomas) Date: Sat, 23 Jul 2016 09:54:32 +0000 Subject: [scikit-learn] about svdd model In-Reply-To: References: Message-ID: Hi, There was a pull request for the svdd https://github.com/scikit-learn/scikit-learn/pull/5899 But it has been closed recently... Note that if you apply the OCSVM with the rbf kernel it is equivalent to the svdd. Albert On sam. 23 juil. 2016 at 10:39, fengyanghe wrote: > Hi guys: > I'm trying to solve an one class classification problem. It seems to be > not very good by using one class svm of scikit-learning. I want to try > support vector data description(svdd) mothed. I'm wondering where I can get > the code of the svdd. > > thanks. _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jul 25 07:54:38 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 25 Jul 2016 13:54:38 +0200 Subject: [scikit-learn] 0.18? In-Reply-To: References: <577D4D41.60102@gmail.com> Message-ID: Sorry for the late reply, Before working on this release I would like to automate the wheel generation process (for the release wheels) in a single repo that will generate wheels for linux, osx and windows based on https://github.com/matthew-brett/multibuild I plan to put that repo under https://github.com/scikit-learn/scikit-learn-wheels and deprecate https://github.com/MacPython/scikit-learn-wheels that we used for the OSX wheels. There is also some issue triaging to do, it would be great to identify blocker bugs that we would like to get fixed before releasing 0.18. We can aim to do a beta mid-August and the final release after euroscipy (first week of September). -- Olivier From matthew.brett at gmail.com Mon Jul 25 15:32:48 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 25 Jul 2016 20:32:48 +0100 Subject: [scikit-learn] 0.18? In-Reply-To: References: <577D4D41.60102@gmail.com> Message-ID: Hi, On Mon, Jul 25, 2016 at 12:54 PM, Olivier Grisel wrote: > Sorry for the late reply, > > Before working on this release I would like to automate the wheel > generation process (for the release wheels) in a single repo that will > generate wheels for linux, osx and windows based on > https://github.com/matthew-brett/multibuild > > I plan to put that repo under > https://github.com/scikit-learn/scikit-learn-wheels and deprecate > https://github.com/MacPython/scikit-learn-wheels that we used for the > OSX wheels. Actually, sorry, I already switched to multibuild for the MacPython repo, so it already builds manylinux wheels. There's a single test failure now on OSX because I added 32-bit tests to the OSX test command : https://travis-ci.org/MacPython/scikit-learn-wheels/builds/146137997 Obviously feel totally free to move the MacPython repo to scikit-learn if you'd prefer. Cheers, Matthew From t3kcit at gmail.com Mon Jul 25 15:53:18 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 25 Jul 2016 15:53:18 -0400 Subject: [scikit-learn] 0.18? In-Reply-To: References: <577D4D41.60102@gmail.com> Message-ID: <69c5d973-f798-f5a1-a4a3-1ee43e2c1a36@gmail.com> Hi Olivier / all Let me know if I can help with the builds. I'm gonna start reviews and triaging and tagging this week. Mid August sounds good for a beta / RC. It would be great if we could release in September, as that is when The Book (aka my past year) is scheduled to come out (I finished it last week). The Book uses model_selection, so having the release out before the book would be good. Andy On 07/25/2016 07:54 AM, Olivier Grisel wrote: > Sorry for the late reply, > > Before working on this release I would like to automate the wheel > generation process (for the release wheels) in a single repo that will > generate wheels for linux, osx and windows based on > https://github.com/matthew-brett/multibuild > > I plan to put that repo under > https://github.com/scikit-learn/scikit-learn-wheels and deprecate > https://github.com/MacPython/scikit-learn-wheels that we used for the > OSX wheels. > > There is also some issue triaging to do, it would be great to identify > blocker bugs that we would like to get fixed before releasing 0.18. > > We can aim to do a beta mid-August and the final release after > euroscipy (first week of September). > From t3kcit at gmail.com Mon Jul 25 16:01:24 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Mon, 25 Jul 2016 16:01:24 -0400 Subject: [scikit-learn] Three new scikit-learn-contrib projects In-Reply-To: References: <20160720074843.GA609220@phare.normalesup.org> Message-ID: <7fbb59f1-42a4-a2fb-b5f6-295d61ebd6e0@gmail.com> On 07/20/2016 01:31 PM, Guillaume Lema?tre wrote: > Hi Gael, > > I was wondering if you could elaborate on the problem of > hyper-parameter tuning and why the imbalanced-learn would not benefit > from it. > Since that we used the identical pipeline of scikit-learn and add the > part to handle the sampler, I would have think that we could use it. > > However this is true that I did not play to much with this part of the > API, so I should probably missed something. > The assumption is that hyper-parameter tuning uses Pipelines, I think. You want to select all steps in your processing, which is rarely just a single model. However, Pipeline can currently not change the number of samples (see the enhancement proposal Gael linked to). So you can not use your methods in the standard scikit-learn pipeline. Best, Andy From yenchenlin1994 at gmail.com Tue Jul 26 04:09:34 2016 From: yenchenlin1994 at gmail.com (lin yenchen) Date: Tue, 26 Jul 2016 16:09:34 +0800 Subject: [scikit-learn] Blog post about Cython fused type pointer that can help reduce duplicated code Message-ID: Hello guys, here is a blog post - Using Function Pointer with Fused Types to Maximize Code Reusability I wrote to document my findings during GSoC. Hope this help, especially for people who is trying to add fused types to BLAS-based and similar fused type function calls. Please feel free to leave some comments so I can improve it. And thanks Joel for the early review. Best, YenChen Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed Jul 27 11:01:23 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 27 Jul 2016 17:01:23 +0200 Subject: [scikit-learn] 0.18? In-Reply-To: References: <577D4D41.60102@gmail.com> Message-ID: Thanks Matthew I had not realized. I will add an appveyor config there with a dedicated `sklearn-wheels` account so that we don't wait for the `sklearn-ci` jobs when we are building the release wheels. -- Olivier From dmoisset at machinalis.com Wed Jul 27 15:17:31 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Wed, 27 Jul 2016 20:17:31 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? Message-ID: Hi, [If you're also on the numpy mailing list and get a similar version of the message, I apologise for that] I work at Machinalis were we use a lot of scikit-learn (and the pydata stack in general). Recently we've also been getting involved with mypy, which is a tool to type check (not on runtime, think of it as a linter) annotated python code (the way of annotating python types has been recently standarized in PEP 484). As part of that involvement we've started creating type annotations for the Python libraries we use most, which include both numpy and scikit-learn. Mypy provides a way to specify types with annotations in separate files in case you don't have control over a library, so we have created an initial proof of concept for numpy at [1], and we are actively improving it. You can find some additional information about it and some problems we've found on the way at this blogpost [2]. We were planning to also start some work on scikit-learn (which has a much larger surface area than numpy, so probably focusing on small parts for now); we had to start with numpy anyway given that SKL depends on it. What I wanted to ask is if the people involved on the SKL project are aware of PEP484 annotations and if you have some interest in starting using them. The main benefit is that annotations serve as clear (and automatically testable) documentation for users, and secondary benefits is that users discovers bugs more quickly and that some IDEs (like pycharm) are starting to use this information for smart editor features (autocompletion, online checking, refactoring tools); eventually tools like jupyter could take advantage of these annotations in the future. And the cost of writing and including these are relatively low. We're doing the work anyway, but contributing our typespecs back could make it easier for users to benefit from this, and for us to maintain it and keep it in sync with future releases. If you've never heard about PEP484 or mypy (it happens a lot) I'll be happy to clarify anything about it that might helpunderstand this situation Thanks! D. [1] https://github.com/machinalis/mypy-data [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/ -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Wed Jul 27 17:08:13 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Wed, 27 Jul 2016 17:08:13 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: Hi Daniel. This hasn't been brought up before so there is no "official position". I am generally in favor, though I'm not sure how doable it is. We are generally pretty generous in accepting all kinds of inputs, and many of our options can have different types: (None, int, float, string, nd-array) is relatively common as a type for an option. As we still support 2.6, we would need to do comments or external files. As a user, you are probably most interested in the outputs, right? The types returned by scikit-learn could probably be auto-generated. I'm curious to see what others think. I'd be surprised if anyone is willing to invest a large amount of time on this, though if you guys want to contribute, we might be able to work something out. Andy On 07/27/2016 03:17 PM, Daniel Moisset wrote: > Hi, > > [If you're also on the numpy mailing list and get a similar version of > the message, I apologise for that] > > I work at Machinalis were we use a lot of scikit-learn (and the pydata > stack in general). Recently we've also been getting involved with > mypy, which is a tool to type check (not on runtime, think of it as a > linter) annotated python code (the way of annotating python types has > been recently standarized in PEP 484). > > As part of that involvement we've started creating type annotations > for the Python libraries we use most, which include both numpy and > scikit-learn. Mypy provides a way to specify types with annotations in > separate files in case you don't have control over a library, so we > have created an initial proof of concept for numpy at [1], and we are > actively improving it. You can find some additional information about > it and some problems we've found on the way at this blogpost [2]. We > were planning to also start some work on scikit-learn (which has a > much larger surface area than numpy, so probably focusing on small > parts for now); we had to start with numpy anyway given that SKL > depends on it. > > What I wanted to ask is if the people involved on the SKL project are > aware of PEP484 annotations and if you have some interest in starting > using them. The main benefit is that annotations serve as clear (and > automatically testable) documentation for users, and secondary > benefits is that users discovers bugs more quickly and that some IDEs > (like pycharm) are starting to use this information for smart editor > features (autocompletion, online checking, refactoring tools); > eventually tools like jupyter could take advantage of these > annotations in the future. And the cost of writing and including these > are relatively low. > > We're doing the work anyway, but contributing our typespecs back could > make it easier for users to benefit from this, and for us to maintain > it and keep it in sync with future releases. > > If you've never heard about PEP484 or mypy (it happens a lot) I'll be > happy to clarify anything about it that might helpunderstand this > situation > > Thanks! > > D. > > > [1] https://github.com/machinalis/mypy-data > [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/ > > -- > Daniel F. Moisset - UK Country Manager > www.machinalis.com > Skype: @dmoisset > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Thu Jul 28 11:25:28 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 11:25:28 -0400 Subject: [scikit-learn] CI permissions Message-ID: Hey all. So I think I messed with the CI a bit too much yesterday. After looking into the organization permissions a bit more, I saw that any app that is authorized on github by any of the devs will automatically be authorized for the organization. That seems pretty bad. So I change that default to each app needing to be authorized explicitly for the organization. Clearly I wasn't thinking, because that broke CI. I gave permissions back to the CI services, but they are not all linked to my account. Olivier, can you check for coveralls and circleci? I think you have these accounts. Appveyor and travis are working again. Sorry for the inconvenience, Andy From dmoisset at machinalis.com Thu Jul 28 11:55:17 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 28 Jul 2016 16:55:17 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: Hi Andreas, thanks for the reply, I think many arguments may end up with open specs, but even specifying "Union[None, int, float, string, np.ndarray]" might be useful. I actually expect to be able to provide stricter types using generics (you can say things like "this a list of floats" or "this is a classifier on float features and str labels"). Not only the outputs are relevant, this is useful to detect some silly mistakes like wrong numbers of arguments or misspells in method types. but yes, result types can also detect errors like "some_classifier.fit_transform(X).predict(X)" (which I've seen on carelessly refactored code :) ). The good think of scikit-learn is that most methods already have information about types in docstrings so it should be easy for us to move forward and validate what we're doing (and the docstrings ;-) ) Given that there's interest (or at least no opposition) on this I feel inclined to create a fork and start adding annotations (the comment-based ones) into the code (which is better suited for this scenario than creating external type stubs). We're already putting work into it and would be more than happy to turn it into a contribution if it works well (which is a "real" if given that this is still an experimental terrain) Best, D. On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller wrote: > Hi Daniel. > This hasn't been brought up before so there is no "official position". > I am generally in favor, though I'm not sure how doable it is. > We are generally pretty generous in accepting all kinds of inputs, and > many of our options can have different types: (None, int, float, string, > nd-array) is relatively common as a type for an option. > As we still support 2.6, we would need to do comments or external files. > > As a user, you are probably most interested in the outputs, right? The > types returned by scikit-learn could probably be auto-generated. > > I'm curious to see what others think. > I'd be surprised if anyone is willing to invest a large amount of time on > this, though if you guys want to contribute, > we might be able to work something out. > > Andy > > > > On 07/27/2016 03:17 PM, Daniel Moisset wrote: > > Hi, > > [If you're also on the numpy mailing list and get a similar version of the > message, I apologise for that] > > I work at Machinalis were we use a lot of scikit-learn (and the pydata > stack in general). Recently we've also been getting involved with mypy, > which is a tool to type check (not on runtime, think of it as a linter) > annotated python code (the way of annotating python types has been recently > standarized in PEP 484). > > As part of that involvement we've started creating type annotations for > the Python libraries we use most, which include both numpy and > scikit-learn. Mypy provides a way to specify types with annotations in > separate files in case you don't have control over a library, so we have > created an initial proof of concept for numpy at [1], and we are actively > improving it. You can find some additional information about it and some > problems we've found on the way at this blogpost [2]. We were planning to > also start some work on scikit-learn (which has a much larger surface area > than numpy, so probably focusing on small parts for now); we had to start > with numpy anyway given that SKL depends on it. > > What I wanted to ask is if the people involved on the SKL project are > aware of PEP484 annotations and if you have some interest in starting using > them. The main benefit is that annotations serve as clear (and > automatically testable) documentation for users, and secondary benefits is > that users discovers bugs more quickly and that some IDEs (like pycharm) > are starting to use this information for smart editor features > (autocompletion, online checking, refactoring tools); eventually tools like > jupyter could take advantage of these annotations in the future. And the > cost of writing and including these are relatively low. > > We're doing the work anyway, but contributing our typespecs back could > make it easier for users to benefit from this, and for us to maintain it > and keep it in sync with future releases. > > If you've never heard about PEP484 or mypy (it happens a lot) I'll be > happy to clarify anything about it that might helpunderstand this situation > > Thanks! > > D. > > > [1] https://github.com/machinalis/mypy-data > [2] http://www.machinalis.com/blog/writing-type-stubs-for-numpy/ > > -- > Daniel F. Moisset - UK Country Manager > www.machinalis.com > Skype: @dmoisset > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 28 12:03:01 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2016 17:03:01 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: Hi, On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller wrote: > Hi Daniel. > This hasn't been brought up before so there is no "official position". > I am generally in favor, though I'm not sure how doable it is. > We are generally pretty generous in accepting all kinds of inputs, and many > of our options can have different types: (None, int, float, string, > nd-array) is relatively common as a type for an option. > As we still support 2.6, we would need to do comments or external files. Given numpy has dropped support for 2.6, maybe it would be reasonable for scikit-learn to do the same, to make this process easier? Cheers, Matthew From t3kcit at gmail.com Thu Jul 28 12:04:48 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 12:04:48 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: On 07/28/2016 11:55 AM, Daniel Moisset wrote: > > > Given that there's interest (or at least no opposition) on this I feel > inclined to create a fork and start adding annotations (the > comment-based ones) into the code (which is better suited for this > scenario than creating external type stubs). We're already putting > work into it and would be more than happy to turn it into a > contribution if it works well (which is a "real" if given that this is > still an experimental terrain) > That sounds good to me. But please keep in mind that I don't speak for the project, so if it works well, we would still need to achieve consensus within the project that this is something useful. If you find some bugs with the annotations and mypy, that would probably prove its value to some degree [and if you don't, I might be inclined to argue it's not working well ;] Joel, Olivier, Gael, anyone else?: opinions? From t3kcit at gmail.com Thu Jul 28 12:10:03 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 12:10:03 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: <0c877f91-08ed-5fbb-f032-2ff644e41385@gmail.com> On 07/28/2016 12:03 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller wrote: >> Hi Daniel. >> This hasn't been brought up before so there is no "official position". >> I am generally in favor, though I'm not sure how doable it is. >> We are generally pretty generous in accepting all kinds of inputs, and many >> of our options can have different types: (None, int, float, string, >> nd-array) is relatively common as a type for an option. >> As we still support 2.6, we would need to do comments or external files. > Given numpy has dropped support for 2.6, maybe it would be reasonable > for scikit-learn to do the same, to make this process easier? > How would it change the process? We have been discussing this. My stance is that we should drop it as soon as it creates a major nuisance, but not just for the sake of dropping it. From dmoisset at machinalis.com Thu Jul 28 12:15:36 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 28 Jul 2016 17:15:36 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: On Thu, Jul 28, 2016 at 5:04 PM, Andreas Mueller wrote: > > > On 07/28/2016 11:55 AM, Daniel Moisset wrote: > >> >> >> Given that there's interest (or at least no opposition) on this I feel >> inclined to create a fork and start adding annotations (the comment-based >> ones) into the code (which is better suited for this scenario than creating >> external type stubs). We're already putting work into it and would be more >> than happy to turn it into a contribution if it works well (which is a >> "real" if given that this is still an experimental terrain) >> >> That sounds good to me. But please keep in mind that I don't speak for > the project, so if it works well, we would still need to achieve consensus > within > the project that this is something useful. > Of course; again I'm doing it anyway because I need/want it. So I'm fine if you end up not using it; but I'd be happier if it's useful for people outside our team. If you find some bugs with the annotations and mypy, that would probably > prove its value to some degree [and if you don't, I might be inclined to > argue it's not working well ;] > heh, most mature software in python doesn't have many *type* bugs (which tend to be more superficial), so I don't think I'll change much. I have already played a bit with annotating some 3rd party code[1] and I did not found bugs, but I found some opportunities to make code more readable and/or simple. @Matthew: Regarding 2.6 vs 2.7, I don't believe it changes anything with respect to the effort needed here (2.7 vs 3.x would make a bigger difference, but I know that will take some time). Best, D. -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 28 12:25:21 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2016 17:25:21 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <0c877f91-08ed-5fbb-f032-2ff644e41385@gmail.com> References: <0c877f91-08ed-5fbb-f032-2ff644e41385@gmail.com> Message-ID: On Thu, Jul 28, 2016 at 5:10 PM, Andreas Mueller wrote: > > > On 07/28/2016 12:03 PM, Matthew Brett wrote: >> >> Hi, >> >> On Wed, Jul 27, 2016 at 10:08 PM, Andreas Mueller >> wrote: >>> >>> Hi Daniel. >>> This hasn't been brought up before so there is no "official position". >>> I am generally in favor, though I'm not sure how doable it is. >>> We are generally pretty generous in accepting all kinds of inputs, and >>> many >>> of our options can have different types: (None, int, float, string, >>> nd-array) is relatively common as a type for an option. >>> As we still support 2.6, we would need to do comments or external files. >> >> Given numpy has dropped support for 2.6, maybe it would be reasonable >> for scikit-learn to do the same, to make this process easier? >> > How would it change the process? > We have been discussing this. My stance is that we should drop it > as soon as it creates a major nuisance, but not just for the sake > of dropping it. Ah - sorry - I misunderstood this: > As we still support 2.6, we would need to do comments or external files. to mean 2.6 specifically, rather than 2.x. Cheers, Matthew From gael.varoquaux at normalesup.org Thu Jul 28 12:43:25 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2016 18:43:25 +0200 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <0c877f91-08ed-5fbb-f032-2ff644e41385@gmail.com> References: <0c877f91-08ed-5fbb-f032-2ff644e41385@gmail.com> Message-ID: <20160728164325.GK787902@phare.normalesup.org> On Thu, Jul 28, 2016 at 12:10:03PM -0400, Andreas Mueller wrote: > >Given numpy has dropped support for 2.6, maybe it would be reasonable > >for scikit-learn to do the same, to make this process easier? > How would it change the process? > We have been discussing this. My stance is that we should drop it > as soon as it creates a major nuisance, but not just for the sake > of dropping it. Same feeling here. From gael.varoquaux at normalesup.org Thu Jul 28 12:43:39 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2016 18:43:39 +0200 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: Message-ID: <20160728164339.GD2110660@phare.normalesup.org> On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: > If you find some bugs with the annotations and mypy, that would probably > prove its value to some degree [and if you don't, I might be inclined to > argue it's not working well ;] > Joel, Olivier, Gael, anyone else?: opinions? The only reserve that I might have is with regards to the maintainability of these annotation. I am afraid that they coderot. Daniel, any comments on that concern? Cheers, Ga?l From t3kcit at gmail.com Thu Jul 28 12:49:50 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 12:49:50 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <20160728164339.GD2110660@phare.normalesup.org> References: <20160728164339.GD2110660@phare.normalesup.org> Message-ID: <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> On 07/28/2016 12:43 PM, Gael Varoquaux wrote: > On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: >> If you find some bugs with the annotations and mypy, that would probably >> prove its value to some degree [and if you don't, I might be inclined to >> argue it's not working well ;] >> Joel, Olivier, Gael, anyone else?: opinions? > The only reserve that I might have is with regards to the maintainability > of these annotation. I am afraid that they coderot. > > Daniel, any comments on that concern? We can put mypy in the CI, right? Shouldn't that prevent it from rotting? [I don't actually know. Daniel?] From t3kcit at gmail.com Thu Jul 28 14:44:32 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 14:44:32 -0400 Subject: [scikit-learn] Declaring numpy and scipy dependencies? Message-ID: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> Hi all. I think since the pipy ecosystem improved a lot, we should properly declare the scipy and numpy dependencies, so that ``pip install scikit-learn`` works properly. The argument why we did not do this previously was that this would try to download and build numpy and scipy, which are pretty much guaranteed to fail / result in a very slow installation. This is no longer true. Any opinions? Andy From mail at sebastianraschka.com Thu Jul 28 14:49:39 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 28 Jul 2016 14:49:39 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> Message-ID: <31D9B8F7-2652-4541-B2CC-426CA099957B@sebastianraschka.com> I am not a core dev but just wanted to say that I like the idea of adding static type checking a lot, Daniel. Coincidentally, I just listened to the Podcast.__init__ episode on mypy a few weeks ago and was planning to use it in my personal + research projects as well. I think the ?normal? scikit-learn user would not really benefit from it (since the docstrings are already pretty good and thorough), but I think that it can be immensly useful for devs and contributors (and augmenting the unittest) > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? Yeah, it can be added to Travis CI checks, for example. One question though, are you planning to apply the ?whole" type checking syntax? E.g., def hello(r: int, c=5) -> str: s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s) Does this work with Python 2.7, 3.4 etc? Or are you only thinking about the ?comment? syntax? E.g., def hello(r, c=5): s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s) Which should work on all Py versions. Best, Sebastian > On Jul 28, 2016, at 12:49 PM, Andreas Mueller wrote: > > > > On 07/28/2016 12:43 PM, Gael Varoquaux wrote: >> On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: >>> If you find some bugs with the annotations and mypy, that would probably >>> prove its value to some degree [and if you don't, I might be inclined to >>> argue it's not working well ;] >>> Joel, Olivier, Gael, anyone else?: opinions? >> The only reserve that I might have is with regards to the maintainability >> of these annotation. I am afraid that they coderot. >> >> Daniel, any comments on that concern? > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? > [I don't actually know. Daniel?] > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From mail at sebastianraschka.com Thu Jul 28 14:55:01 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Thu, 28 Jul 2016 14:55:01 -0400 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> Message-ID: <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> I think that should work fine for the `pip install scikit-learn`, however, I think the problem was with upgrading, right? E.g., if you run pip install scikit-learn --upgrade it would try to upgrade numpy and scipy as well, which may not be desired. I think the only workaround would be to run pip install scikit-learn --upgrade --no-deps unless they changed the behavior recently. I mean, it?s not really a problem, but many users may not know about the --no-deps flag. > On Jul 28, 2016, at 2:44 PM, Andreas Mueller wrote: > > Hi all. > I think since the pipy ecosystem improved a lot, we should properly declare the scipy > and numpy dependencies, so that ``pip install scikit-learn`` works properly. > The argument why we did not do this previously was that this would try to download > and build numpy and scipy, which are pretty much guaranteed to fail / result > in a very slow installation. This is no longer true. > > Any opinions? > > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From matthew.brett at gmail.com Thu Jul 28 15:04:09 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2016 20:04:09 +0100 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> Message-ID: On Thu, Jul 28, 2016 at 7:55 PM, Sebastian Raschka wrote: > I think that should work fine for the `pip install scikit-learn`, however, I think the problem was with upgrading, right? > E.g., if you run > > pip install scikit-learn --upgrade > > it would try to upgrade numpy and scipy as well, which may not be desired. I think the only workaround would be to run > > pip install scikit-learn --upgrade --no-deps > > unless they changed the behavior recently. I mean, it?s not really a problem, but many users may not know about the --no-deps flag. > Also - the install will work fine for platforms with wheels, but is still bad for platforms without - like the Raspberry Pi. Cheers, Matthew From t3kcit at gmail.com Thu Jul 28 15:03:39 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 15:03:39 -0400 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> Message-ID: <8142bc0e-ad30-9cc4-7f1e-5cb85901a4b9@gmail.com> On 07/28/2016 02:55 PM, Sebastian Raschka wrote: > I think that should work fine for the `pip install scikit-learn`, however, I think the problem was with upgrading, right? Well so far the compiling was more the issue. > E.g., if you run > > pip install scikit-learn --upgrade > > it would try to upgrade numpy and scipy as well, which may not be desired. I think the only workaround would be to run > > pip install scikit-learn --upgrade --no-deps > > unless they changed the behavior recently. I mean, it?s not really a problem, but many users may not know about the --no-deps flag. > Well, but that's a pip usability bug. Also, with binary wheels, this shouldn't be a big deal. From t3kcit at gmail.com Thu Jul 28 15:10:58 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 15:10:58 -0400 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> Message-ID: <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> On 07/28/2016 03:04 PM, Matthew Brett wrote: > On Thu, Jul 28, 2016 at 7:55 PM, Sebastian Raschka > wrote: >> I think that should work fine for the `pip install scikit-learn`, however, I think the problem was with upgrading, right? >> E.g., if you run >> >> pip install scikit-learn --upgrade >> >> it would try to upgrade numpy and scipy as well, which may not be desired. I think the only workaround would be to run >> >> pip install scikit-learn --upgrade --no-deps >> >> unless they changed the behavior recently. I mean, it?s not really a problem, but many users may not know about the --no-deps flag. >> > Also - the install will work fine for platforms with wheels, but is > still bad for platforms without - like the Raspberry Pi. Hm... so these would be ARM wheels? Or Raspberry Pi specific ones? Do you know if there are plans? Not sure how I feel about this. Do all platforms need to have wheels before we can rely on them? From matthew.brett at gmail.com Thu Jul 28 15:16:41 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2016 20:16:41 +0100 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> Message-ID: On Thu, Jul 28, 2016 at 8:10 PM, Andreas Mueller wrote: > > > On 07/28/2016 03:04 PM, Matthew Brett wrote: >> >> On Thu, Jul 28, 2016 at 7:55 PM, Sebastian Raschka >> wrote: >>> >>> I think that should work fine for the `pip install scikit-learn`, >>> however, I think the problem was with upgrading, right? >>> E.g., if you run >>> >>> pip install scikit-learn --upgrade >>> >>> it would try to upgrade numpy and scipy as well, which may not be >>> desired. I think the only workaround would be to run >>> >>> pip install scikit-learn --upgrade --no-deps >>> >>> unless they changed the behavior recently. I mean, it?s not really a >>> problem, but many users may not know about the --no-deps flag. >>> >> Also - the install will work fine for platforms with wheels, but is >> still bad for platforms without - like the Raspberry Pi. > > Hm... so these would be ARM wheels? Or Raspberry Pi specific ones? No, they'd have to be Raspberry Pi specific ones because no-one has worked out a general ARM-wide specification, as we have for Intel Linux = manylinux1. > Do you know if there are plans? > Not sure how I feel about this. Do all platforms need to have wheels before > we can rely on them? I'm not sure either - just throwing it out there... Matthew From vaggi.federico at gmail.com Thu Jul 28 15:30:45 2016 From: vaggi.federico at gmail.com (federico vaggi) Date: Thu, 28 Jul 2016 19:30:45 +0000 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> Message-ID: My main issue with the upgrade is that if there was a slightly newer version of numpy/scipy it would try to upgrade my numpy/scipy linked against MKL/blas to a vanilla version downloaded from the cheese shop. It was a huge pain. On Thu, 28 Jul 2016 at 21:17 Matthew Brett wrote: > On Thu, Jul 28, 2016 at 8:10 PM, Andreas Mueller wrote: > > > > > > On 07/28/2016 03:04 PM, Matthew Brett wrote: > >> > >> On Thu, Jul 28, 2016 at 7:55 PM, Sebastian Raschka > >> wrote: > >>> > >>> I think that should work fine for the `pip install scikit-learn`, > >>> however, I think the problem was with upgrading, right? > >>> E.g., if you run > >>> > >>> pip install scikit-learn --upgrade > >>> > >>> it would try to upgrade numpy and scipy as well, which may not be > >>> desired. I think the only workaround would be to run > >>> > >>> pip install scikit-learn --upgrade --no-deps > >>> > >>> unless they changed the behavior recently. I mean, it?s not really a > >>> problem, but many users may not know about the --no-deps flag. > >>> > >> Also - the install will work fine for platforms with wheels, but is > >> still bad for platforms without - like the Raspberry Pi. > > > > Hm... so these would be ARM wheels? Or Raspberry Pi specific ones? > > No, they'd have to be Raspberry Pi specific ones because no-one has > worked out a general ARM-wide specification, as we have for Intel > Linux = manylinux1. > > > Do you know if there are plans? > > Not sure how I feel about this. Do all platforms need to have wheels > before > > we can rely on them? > > I'm not sure either - just throwing it out there... > > Matthew > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 28 15:33:56 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2016 20:33:56 +0100 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> Message-ID: On Thu, Jul 28, 2016 at 8:30 PM, federico vaggi wrote: > My main issue with the upgrade is that if there was a slightly newer version > of numpy/scipy it would try to upgrade my numpy/scipy linked against > MKL/blas to a vanilla version downloaded from the cheese shop. It was a > huge pain. The current Linux wheels are linked against recent OpenBLAS, and OSX wheels link against Accelerate, and both should be pretty fast, but I take your point. Matthew From t3kcit at gmail.com Thu Jul 28 15:49:40 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 15:49:40 -0400 Subject: [scikit-learn] Declaring numpy and scipy dependencies? In-Reply-To: References: <195faf56-d8c6-49e0-7fd7-5bb4f1b22931@gmail.com> <98971054-939E-416C-BA47-AE5AD515E170@sebastianraschka.com> <705a27d4-3643-bc9b-11a8-80ba0f6752bf@gmail.com> Message-ID: <32573e5a-cc5c-b5b8-7fbe-914da9bd1200@gmail.com> On 07/28/2016 03:30 PM, federico vaggi wrote: > My main issue with the upgrade is that if there was a slightly newer > version of numpy/scipy it would try to upgrade my numpy/scipy linked > against MKL/blas to a vanilla version downloaded from the cheese > shop. It was a huge pain. > You mean with binary wheels or without? Now the cheese shop will give you OpenBlas. From nicholdav at gmail.com Thu Jul 28 15:52:42 2016 From: nicholdav at gmail.com (David Nicholson) Date: Thu, 28 Jul 2016 15:52:42 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <31D9B8F7-2652-4541-B2CC-426CA099957B@sebastianraschka.com> References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> <31D9B8F7-2652-4541-B2CC-426CA099957B@sebastianraschka.com> Message-ID: Agreed that it sounded great on podcast.__init__ and I specifically thought it would be helpful for sklearn as someone who is not a developer but has been digging into the codebase. If anyone on the list wants an overview of MyPy I highly recommend listening to that episode: http://podcastinit.podbean.com/e/episode-65-mypy-with-david-fisher-and-greg-price/ On Thu, Jul 28, 2016 at 2:49 PM, Sebastian Raschka < mail at sebastianraschka.com> wrote: > I am not a core dev but just wanted to say that I like the idea of adding > static type checking a lot, Daniel. Coincidentally, I just listened to the > Podcast.__init__ episode on mypy a few weeks ago and was planning to use it > in my personal + research projects as well. > > I think the ?normal? scikit-learn user would not really benefit from it > (since the docstrings are already pretty good and thorough), but I think > that it can be immensly useful for devs and contributors (and augmenting > the unittest) > > > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? > > Yeah, it can be added to Travis CI checks, for example. > > One question though, are you planning to apply the ?whole" type checking > syntax? E.g., > > def hello(r: int, c=5) -> str: > s = 'hello' # type: str > return '(%d + %d) times %s' % (r, c, s) > > Does this work with Python 2.7, 3.4 etc? > > Or are you only thinking about the ?comment? syntax? E.g., > > def hello(r, c=5): > s = 'hello' # type: str > return '(%d + %d) times %s' % (r, c, s) > > Which should work on all Py versions. > > Best, > Sebastian > > > > On Jul 28, 2016, at 12:49 PM, Andreas Mueller wrote: > > > > > > > > On 07/28/2016 12:43 PM, Gael Varoquaux wrote: > >> On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: > >>> If you find some bugs with the annotations and mypy, that would > probably > >>> prove its value to some degree [and if you don't, I might be inclined > to > >>> argue it's not working well ;] > >>> Joel, Olivier, Gael, anyone else?: opinions? > >> The only reserve that I might have is with regards to the > maintainability > >> of these annotation. I am afraid that they coderot. > >> > >> Daniel, any comments on that concern? > > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? > > [I don't actually know. Daniel?] > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- David Nicholson, Ph.D. Candidate Sober Lab , Emory Neuroscience Program. www.nicholdav.info; https://github.com/NickleDave -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Jul 28 17:48:06 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2016 23:48:06 +0200 Subject: [scikit-learn] CI permissions In-Reply-To: References: Message-ID: <20160728214806.GD1588438@phare.normalesup.org> On Thu, Jul 28, 2016 at 11:25:28AM -0400, Andreas Mueller wrote: > I gave permissions back to the CI services, but they are not all linked to > my account. > Olivier, can you check for coveralls and circleci? I think you have these > accounts. > Appveyor and travis are working again. Hum, Olivier is offline till Monday (well, he might have email, but unsure), and I don't have permission on these :(. This reminds me that we should strive for redundant permissions as much as possible. Cheers, Ga?l From t3kcit at gmail.com Thu Jul 28 17:54:16 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Thu, 28 Jul 2016 17:54:16 -0400 Subject: [scikit-learn] CI permissions In-Reply-To: <20160728214806.GD1588438@phare.normalesup.org> References: <20160728214806.GD1588438@phare.normalesup.org> Message-ID: <0119f3e1-4398-a0ff-ec61-2385a72155fe@gmail.com> On 07/28/2016 05:48 PM, Gael Varoquaux wrote: > On Thu, Jul 28, 2016 at 11:25:28AM -0400, Andreas Mueller wrote: >> I gave permissions back to the CI services, but they are not all linked to >> my account. >> Olivier, can you check for coveralls and circleci? I think you have these >> accounts. >> Appveyor and travis are working again. > Hum, Olivier is offline till Monday (well, he might have email, but > unsure), and I don't have permission on these :(. > > This reminds me that we should strive for redundant permissions as much > as possible. > I was just suggesting on an issue about our MLOSS entry that we have a private repo to store all the keys (I needed to go back to my university Bonn email account and find an email from Fabian to find the MLOSS credentials). CircleCI is back online, I think we can live without coveralls for a weekend. From xulifan at udel.edu Thu Jul 28 20:50:41 2016 From: xulifan at udel.edu (Lifan Xu) Date: Thu, 28 Jul 2016 20:50:41 -0400 Subject: [scikit-learn] error from import sklearn Message-ID: Hi guys, I am sorry to bother you. But I have a question about using scikit-learn in ubuntu. I installed scikit using: sudo apt-get install build-essential python-dev python-numpy python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib ipython sudo pip install -U scikit-learn However, when I try to use scikit, I got this error: Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from sklearn import datasets Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/sklearn/__init__.py", line 57, in from .base import clone File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 11, in from .utils.fixes import signature File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/__init__.py", line 11, in from .validation import (as_float_array, File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 16, in from ..utils.fixes import signature File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/fixes.py", line 322, in from ._scipy_sparse_lsqr_backport import lsqr as sparse_lsqr File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/_scipy_sparse_lsqr_backport.py", line 58, in from scipy.sparse.linalg.interface import aslinearoperator File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/__init__.py", line 108, in from .isolve import * File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/__init__.py", line 6, in from .iterative import * File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", line 11, in from scipy.lib.decorator import decorator File "/usr/lib/python2.7/dist-packages/scipy/lib/decorator.py", line 39, in import sys, re, inspect File "/usr/lib/python2.7/inspect.py", line 37, in import dis File "/usr/lib/python2.7/dis.py", line 7, in from opcode import __all__ as _opcodes_all ImportError: cannot import name __all__ Anyone knows how to fix this problem? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaquesgrobler at gmail.com Fri Jul 29 02:44:26 2016 From: jaquesgrobler at gmail.com (jaquesgrobler at gmail.com) Date: Fri, 29 Jul 2016 08:44:26 +0200 Subject: [scikit-learn] CI permissions In-Reply-To: References: Message-ID: <29C7597C-0832-486D-8844-643922C272A1@gmail.com> Hey guys - I just checked coveralls -- appears to be online on 94% coverage so all seems fine there - I still have access to it from setting it up back when haha So we're 'all-covered' Hope you guys are well! Jaques > On 28 Jul 2016, at 5:25 PM, Andreas Mueller wrote: > > Hey all. > So I think I messed with the CI a bit too much yesterday. > After looking into the organization permissions a bit more, I saw that any app that is authorized on github by any of the devs > will automatically be authorized for the organization. > That seems pretty bad. > > So I change that default to each app needing to be authorized explicitly for the organization. > Clearly I wasn't thinking, because that broke CI. > I gave permissions back to the CI services, but they are not all linked to my account. > Olivier, can you check for coveralls and circleci? I think you have these accounts. > Appveyor and travis are working again. > > Sorry for the inconvenience, > Andy > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Fri Jul 29 10:32:07 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 29 Jul 2016 10:32:07 -0400 Subject: [scikit-learn] error from import sklearn In-Reply-To: References: Message-ID: <9601d379-6864-c6e3-0952-1bd4ef3eb34d@gmail.com> Hi Lifan. It looks like there's something wrong with your scipy installation (or python installation?) Can you try from scipy.sparse.linalg.interface import aslinearoperator and import dis Btw, where did you get these installation instructions from? The Readme? Best, Andy On 07/28/2016 08:50 PM, Lifan Xu wrote: > Hi guys, > > I am sorry to bother you. But I have a question about using > scikit-learn in ubuntu. > > I installed scikit using: > > sudo apt-get install build-essential python-dev python-numpy > python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib ipython > sudo pip install -U scikit-learn > > However, when I try to use scikit, I got this error: > > > Python 2.7.6 (default, Jun 22 2015, 17:58:13) > > [GCC 4.8.2] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> from sklearn import datasets > > Traceback (most recent call last): > > File "", line 1, in > > File "/usr/local/lib/python2.7/dist-packages/sklearn/__init__.py", > line 57, in > > from .base import clone > > File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line > 11, in > > from .utils.fixes import signature > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/__init__.py", > line 11, in > > from .validation import (as_float_array, > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", > line 16, in > > from ..utils.fixes import signature > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/fixes.py", line > 322, in > > from ._scipy_sparse_lsqr_backport import lsqr as sparse_lsqr > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/_scipy_sparse_lsqr_backport.py", > line 58, in > > from scipy.sparse.linalg.interface import aslinearoperator > > File > "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/__init__.py", > line 108, in > > from .isolve import * > > File > "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/__init__.py", > line 6, in > > from .iterative import * > > File > "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", > line 11, in > > from scipy.lib.decorator import decorator > > File "/usr/lib/python2.7/dist-packages/scipy/lib/decorator.py", line > 39, in > > import sys, re, inspect > > File "/usr/lib/python2.7/inspect.py", line 37, in > > import dis > > File "/usr/lib/python2.7/dis.py", line 7, in > > from opcode import __all__ as _opcodes_all > > ImportError: cannot import name __all__ > > > Anyone knows how to fix this problem? > > Thanks! > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From t3kcit at gmail.com Fri Jul 29 10:33:06 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 29 Jul 2016 10:33:06 -0400 Subject: [scikit-learn] CI permissions In-Reply-To: <29C7597C-0832-486D-8844-643922C272A1@gmail.com> References: <29C7597C-0832-486D-8844-643922C272A1@gmail.com> Message-ID: <40c6601e-f812-8a6c-7f18-b7d6399a19b7@gmail.com> On 07/29/2016 02:44 AM, jaquesgrobler at gmail.com wrote: > Hey guys - I just checked coveralls -- appears to be online on 94% coverage so all seems fine there - I still have access to it from setting it up back when haha > > So we're 'all-covered' > > Hope you guys are well! > So you have the keys to that? And is the coveralls CI running again? If not, can you enable it? And send me and Olivier and Gael the keys? Thanks, Andy From dmoisset at machinalis.com Fri Jul 29 12:55:44 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Fri, 29 Jul 2016 17:55:44 +0100 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> Message-ID: @Andreas, @Gael: This indeed is something that could be included in the CI, and you could ensure that the annotations have both internal consistency (i.e., what they say matches what the implementation is doing) and external consistency (the way callers are using it matches the way they call it). To clarify a bit, there are 2 things involved here: * PEP-484 provides a standard way to add anotations. Those have some syntax but it's just metadata that gets stored but have no effect whatsoever on runtime, kind of like a structured docstring. These have no protection by themselves against bitrot (in the same way that docstrings may rot) * mypy is a tool that can be run on a linter, it parses the code and the annotations, and verify that there's consistency. It's something that you can use on CI or whle developer (comparable to a linter like flake8, but doing a deeper analysis). The annotations described by PEP484 is "gradual" (you don't have to cover the whole code, only the parts where static typing makes sense, and "unchecked" code is not modified). mypy respects that and also provides a way to silence the checker for situations where the type system is oversensitive but you know you're right (similar to flake8's "# noqa"). @Sebastian I had heard the podcast and it makes a very strong argument argument for using it, thanks for recommending it (people in dropbox are using this on their production codebase). I do believe that end users will start getting benefits from this that are stronger than docstrings, specially when this tooling starts to get integrated in code editors (pycharm is already doing this) so they can get inline checking and detection of errors when they call the scikit-learn API, and better context-aware completion. That's not counting those users that want to use mypy in their own codebases and would get a better advantage if SKL supported it (that's the situation I am in, together with some colleagues). Regarding syntax, if we add inline annotations (which IMO is the best path forward if they have a chance of getting integrated), the only option is using the 2.x compatible annotations (which are comments). That one is different to your 2 examples, that would be: def hello(r, c=5): # type: (int, int) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s) (note that your " # type: str" is valid but not required, given that s can be obviously inferred to be a string) Another possible syntax (both are valid, this one makes sense for longer signatures) is: def hello(r, # type: int c=5): # type: (...) -> str s = 'hello' return '(%d + %d) times %s' % (r, c, s) (in this case there's again no need to specify a type for c given that it can be inferred as an int) These 2 variants work well in 2.x and 3.x Best, D. P.S.: In my last email I forgot to put this link describing some of the things that I've found on real code http://www.machinalis.com/blog/a-day-with-mypy-part-1/ On Thu, Jul 28, 2016 at 5:49 PM, Andreas Mueller wrote: > > > On 07/28/2016 12:43 PM, Gael Varoquaux wrote: > >> On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: >> >>> If you find some bugs with the annotations and mypy, that would probably >>> prove its value to some degree [and if you don't, I might be inclined to >>> argue it's not working well ;] >>> Joel, Olivier, Gael, anyone else?: opinions? >>> >> The only reserve that I might have is with regards to the maintainability >> of these annotation. I am afraid that they coderot. >> >> Daniel, any comments on that concern? >> > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? > [I don't actually know. Daniel?] > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Jul 29 13:37:50 2016 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 29 Jul 2016 13:37:50 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> Message-ID: <376C9459-E534-4264-86D6-8AEBE7D0228B@sebastianraschka.com> Thanks for the update, Daniel. The Py 2.x compatible alternatives, > def hello(r, c=5): > # type: (int, int) -> str ? are neat, and I didn?t know about these. Although, I must say that > def hello(r, c=5): > # type: (int, int) -> str ? is a tad more useful, for example, in Jupyter Notebooks/IPython regarding the shift-tab function help. However, I?d say that your suggestion is the best bet for now to maintain Py 2.x compatibility (until 2020 maybe :P). Cheers, Sebastian > On Jul 29, 2016, at 12:55 PM, Daniel Moisset wrote: > > @Andreas, @Gael: > > This indeed is something that could be included in the CI, and you could ensure that the annotations have both internal consistency (i.e., what they say matches what the implementation is doing) and external consistency (the way callers are using it matches the way they call it). > > To clarify a bit, there are 2 things involved here: > > * PEP-484 provides a standard way to add anotations. Those have some syntax but it's just metadata that gets stored but have no effect whatsoever on runtime, kind of like a structured docstring. These have no protection by themselves against bitrot (in the same way that docstrings may rot) > * mypy is a tool that can be run on a linter, it parses the code and the annotations, and verify that there's consistency. It's something that you can use on CI or whle developer (comparable to a linter like flake8, but doing a deeper analysis). > > The annotations described by PEP484 is "gradual" (you don't have to cover the whole code, only the parts where static typing makes sense, and "unchecked" code is not modified). mypy respects that and also provides a way to silence the checker for situations where the type system is oversensitive but you know you're right (similar to flake8's "# noqa"). > > @Sebastian > > I had heard the podcast and it makes a very strong argument argument for using it, thanks for recommending it (people in dropbox are using this on their production codebase). > > I do believe that end users will start getting benefits from this that are stronger than docstrings, specially when this tooling starts to get integrated in code editors (pycharm is already doing this) so they can get inline checking and detection of errors when they call the scikit-learn API, and better context-aware completion. That's not counting those users that want to use mypy in their own codebases and would get a better advantage if SKL supported it (that's the situation I am in, together with some colleagues). > > Regarding syntax, if we add inline annotations (which IMO is the best path forward if they have a chance of getting integrated), the only option is using the 2.x compatible annotations (which are comments). That one is different to your 2 examples, that would be: > > def hello(r, c=5): > # type: (int, int) -> str > s = 'hello' > return '(%d + %d) times %s' % (r, c, s) > > (note that your " # type: str" is valid but not required, given that s can be obviously inferred to be a string) > > Another possible syntax (both are valid, this one makes sense for longer signatures) is: > > def hello(r, # type: int > c=5): > # type: (...) -> str > s = 'hello' > return '(%d + %d) times %s' % (r, c, s) > > (in this case there's again no need to specify a type for c given that it can be inferred as an int) > > These 2 variants work well in 2.x and 3.x > > Best, > D. > > P.S.: In my last email I forgot to put this link describing some of the things that I've found on real code http://www.machinalis.com/blog/a-day-with-mypy-part-1/ > > > > On Thu, Jul 28, 2016 at 5:49 PM, Andreas Mueller wrote: > > > On 07/28/2016 12:43 PM, Gael Varoquaux wrote: > On Thu, Jul 28, 2016 at 12:04:48PM -0400, Andreas Mueller wrote: > If you find some bugs with the annotations and mypy, that would probably > prove its value to some degree [and if you don't, I might be inclined to > argue it's not working well ;] > Joel, Olivier, Gael, anyone else?: opinions? > The only reserve that I might have is with regards to the maintainability > of these annotation. I am afraid that they coderot. > > Daniel, any comments on that concern? > We can put mypy in the CI, right? Shouldn't that prevent it from rotting? > [I don't actually know. Daniel?] > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > -- > Daniel F. Moisset - UK Country Manager > www.machinalis.com > Skype: @dmoisset > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From t3kcit at gmail.com Fri Jul 29 13:47:56 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Fri, 29 Jul 2016 13:47:56 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> Message-ID: <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> Hi Daniel. Thanks for your clarification, that's exactly how I understood it to work from what I saw so far. I don't like either annotation type in terms of syntax that much. I don't understand why they didn't go with something closer to the Python 3 syntax. But I guess we have to live with it or write a new pep ;) I think the one-line version should be preferred as it is shorter and less intrusive. Best, Andy From xulifan at udel.edu Fri Jul 29 14:12:33 2016 From: xulifan at udel.edu (Lifan Xu) Date: Fri, 29 Jul 2016 14:12:33 -0400 Subject: [scikit-learn] error from import sklearn In-Reply-To: <9601d379-6864-c6e3-0952-1bd4ef3eb34d@gmail.com> References: <9601d379-6864-c6e3-0952-1bd4ef3eb34d@gmail.com> Message-ID: Hi, I found the problem was caused by another python source file named "opcode.py". Because python2.7 also has a source file named "opcode.py" If I rename my file to "opcode_extract.py", then problem solved. Thanks. On Fri, Jul 29, 2016 at 10:32 AM, Andreas Mueller wrote: > Hi Lifan. > It looks like there's something wrong with your scipy installation (or > python installation?) > Can you try > > from scipy.sparse.linalg.interface import aslinearoperator > > and > > import dis > > > Btw, where did you get these installation instructions from? > The Readme? > > Best, > Andy > > > On 07/28/2016 08:50 PM, Lifan Xu wrote: > > Hi guys, > > I am sorry to bother you. But I have a question about using > scikit-learn in ubuntu. > > I installed scikit using: > > sudo apt-get install build-essential python-dev python-numpy > python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib ipython > sudo pip install -U scikit-learn > > However, when I try to use scikit, I got this error: > > > Python 2.7.6 (default, Jun 22 2015, 17:58:13) > > [GCC 4.8.2] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> from sklearn import datasets > > Traceback (most recent call last): > > File "", line 1, in > > File "/usr/local/lib/python2.7/dist-packages/sklearn/__init__.py", line > 57, in > > from .base import clone > > File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 11, > in > > from .utils.fixes import signature > > File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/__init__.py", > line 11, in > > from .validation import (as_float_array, > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line > 16, in > > from ..utils.fixes import signature > > File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/fixes.py", > line 322, in > > from ._scipy_sparse_lsqr_backport import lsqr as sparse_lsqr > > File > "/usr/local/lib/python2.7/dist-packages/sklearn/utils/_scipy_sparse_lsqr_backport.py", > line 58, in > > from scipy.sparse.linalg.interface import aslinearoperator > > File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/__init__.py", > line 108, in > > from .isolve import * > > File > "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/__init__.py", > line 6, in > > from .iterative import * > > File > "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/isolve/iterative.py", > line 11, in > > from scipy.lib.decorator import decorator > > File "/usr/lib/python2.7/dist-packages/scipy/lib/decorator.py", line 39, > in > > import sys, re, inspect > > File "/usr/lib/python2.7/inspect.py", line 37, in > > import dis > > File "/usr/lib/python2.7/dis.py", line 7, in > > from opcode import __all__ as _opcodes_all > > ImportError: cannot import name __all__ > > > Anyone knows how to fix this problem? > > Thanks! > > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Jul 29 15:57:18 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 29 Jul 2016 21:57:18 +0200 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> Message-ID: <20160729195718.GO787902@phare.normalesup.org> I am still worried that this is going to add even more complexity to contributing: people will contribute without knowing type hint, CI will break, they won't understand why it breaks, won't be able to reproduce it, and it will stall PRs. Can you summarize once again in very simple terms what would be the big benefits? From vaggi.federico at gmail.com Fri Jul 29 16:05:19 2016 From: vaggi.federico at gmail.com (federico vaggi) Date: Fri, 29 Jul 2016 20:05:19 +0000 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: <20160729195718.GO787902@phare.normalesup.org> References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> <20160729195718.GO787902@phare.normalesup.org> Message-ID: I've been using mypy on a much smaller codebase I've been developing. The main benefits are: 1- Much nicer IDE experience when using something like pycharm. I expect more text editors to start supporting this in the future. 2- An additional way to catch some compile time errors early on. For a codebase as mature as scikit-learn, that's probably not a huge deal. 3- Makes it nicer for other codebases using mypy to use scikit-learn. Of those, the main benefit is by far 1. I also think that the opportunity cost is very low: annotations are easy to keep up to date, and the annotation syntax is really very simple. On Fri, 29 Jul 2016 at 21:57 Gael Varoquaux wrote: > I am still worried that this is going to add even more complexity to > contributing: people will contribute without knowing type hint, CI will > break, they won't understand why it breaks, won't be able to reproduce > it, and it will stall PRs. > > Can you summarize once again in very simple terms what would be the big > benefits? > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at telecom-paristech.fr Sat Jul 30 03:20:09 2016 From: alexandre.gramfort at telecom-paristech.fr (Alexandre Gramfort) Date: Sat, 30 Jul 2016 09:20:09 +0200 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> <20160729195718.GO787902@phare.normalesup.org> Message-ID: > I am still worried that this is going to add even more complexity to > contributing: people will contribute without knowing type hint, CI will > break, they won't understand why it breaks, won't be able to reproduce > it, and it will stall PRs. +1 same feeling here. A From t3kcit at gmail.com Sat Jul 30 09:57:20 2016 From: t3kcit at gmail.com (Andreas Mueller) Date: Sat, 30 Jul 2016 09:57:20 -0400 Subject: [scikit-learn] Is there any official position on PEP484/mypy? In-Reply-To: References: <20160728164339.GD2110660@phare.normalesup.org> <598b3780-5b3d-2eb8-7e57-da3856026d0b@gmail.com> <014c8cb1-8997-67a9-3d6a-f0b94c63b7ff@gmail.com> <20160729195718.GO787902@phare.normalesup.org> Message-ID: <2f81a41b-e959-5114-82a6-f1453aacf71b@gmail.com> On 07/30/2016 03:20 AM, Alexandre Gramfort wrote: >> I am still worried that this is going to add even more complexity to >> contributing: people will contribute without knowing type hint, CI will >> break, they won't understand why it breaks, won't be able to reproduce >> it, and it will stall PRs. > +1 > > same feeling here. > That is "only" a concern if they change the type of something that is annotated, right? Is that something that happens often? I guess it would also break if someone adds an argument to an annotated function, that might happen more often.