[scikit-learn] Issues with clone for ensemble of, classifiers

Andreas Mueller t3kcit at gmail.com
Wed Sep 26 12:28:53 EDT 2018


Yes, I actually mentioned that on the roadmap thread. It should 
definitely be added.

On 09/19/2018 06:17 PM, Guillaume Lemaître wrote:
> Actually I don't see anything mentioning it in the road map currently. 
> Should it be added?
>
> Sent from my phone - sorry to be brief and potential misspell.
>
> *From:* luiz.gh at gmail.com
> *Sent:* 19 September 2018 7:12 pm
> *To:* scikit-learn at python.org
> *Reply to:* scikit-learn at python.org
> *Subject:* Re: [scikit-learn] Issues with clone for ensemble of, 
> classifiers
>
>
> Guillaume - thank you for the comments. Indeed, an approach to 
> "freeze" a fitted classifier would solve our problem. The Github issue 
> seems to be inactive for a while, but I will check if anyone else is 
> working on it.
>
> Luiz Gustavo
>
>
> On Wed, Sep 19, 2018 at 12:02 PM <scikit-learn-request at python.org 
> <mailto:scikit-learn-request at python.org>> wrote:
>
>     Send scikit-learn mailing list submissions to
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>
>     To subscribe or unsubscribe via the World Wide Web, visit
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     or, via email, send a message with subject or body 'help' to
>     scikit-learn-request at python.org
>     <mailto:scikit-learn-request at python.org>
>
>     You can reach the person managing the list at
>     scikit-learn-owner at python.org <mailto:scikit-learn-owner at python.org>
>
>     When replying, please edit your Subject line so it is more specific
>     than "Re: Contents of scikit-learn digest..."
>
>
>     Today's Topics:
>
>        1. Re: Issues with clone for ensemble of classifiers
>           (Guillaume Lema?tre)
>
>
>     ----------------------------------------------------------------------
>
>     Message: 1
>     Date: Wed, 19 Sep 2018 17:38:46 +0200
>     From: Guillaume Lema?tre <g.lemaitre58 at gmail.com
>     <mailto:g.lemaitre58 at gmail.com>>
>     To: Scikit-learn user and developer mailing list
>             <scikit-learn at python.org <mailto:scikit-learn at python.org>>
>     Subject: Re: [scikit-learn] Issues with clone for ensemble of
>             classifiers
>     Message-ID:
>            
>     <CACDxx9gyszjJP-5ZB_bvH4nCkdn-sb6CCb=k2j_kOOnFPBQt0g at mail.gmail.com
>     <mailto:k2j_kOOnFPBQt0g at mail.gmail.com>>
>     Content-Type: text/plain; charset="UTF-8"
>
>     However, there is some issue to frozen a fitted classifier. You
>     can refer to:
>
>     https://github.com/scikit-learn/scikit-learn/issues/8370
>
>     with the associated discussion.
>     On Wed, 19 Sep 2018 at 17:34, Guillaume Lema?tre
>     <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>     >
>     > Ups I misread your comment. I don't think that we have currently a
>     > mechanism to avoid cloning classifier internally.
>     > On Wed, 19 Sep 2018 at 17:31, Guillaume Lema?tre
>     <g.lemaitre58 at gmail.com <mailto:g.lemaitre58 at gmail.com>> wrote:
>     > >
>     > > You don't have anywhere in your class MyClassifier where you are
>     > > calling base_classifier.fit <http://classifier.fit>(...)
>     therefore when calling
>     > > base_classifier.predict <http://classifier.predict>(...) it
>     will let you know that you did not fit
>     > > it.
>     > >
>     > > On Wed, 19 Sep 2018 at 16:43, Luiz Gustavo Hafemann
>     <luiz.gh at gmail.com <mailto:luiz.gh at gmail.com>> wrote:
>     > > >
>     > > > Hello,
>     > > >
>     > > > I am one of the developers of a library for Dynamic Ensemble
>     Selection (DES) methods (the library is called DESlib), and we are
>     currently working to get the library fully compatible with
>     scikit-learn (to submit it to scikit-learn-contrib). We have
>     "check_estimator" working for most of the classes, but now I am
>     having problems to make the classes compatible with GridSearch /
>     other CV functions.
>     > > >
>     > > > One of the main use cases of this library is to facilitate
>     research on this field, and this led to a design decision that the
>     base classifiers are fit by the user, and the DES methods receive
>     a pool of base classifiers that were already fit (this allow users
>     to compare many DES techniques with the same base classifiers).
>     This is creating an issue with GridSearch, since the clone method
>     (defined in sklearn.base <http://sklearn.base>) is not cloning the
>     classes as we would like. It does a shallow (non-deep) copy of the
>     parameters, but we would like the pool of base classifiers to be
>     deep-copied.
>     > > >
>     > > > I analyzed this issue and I could not find a solution that
>     does not require changes on the scikit-learn code. Here is the
>     sequence of steps that cause the problem:
>     > > >
>     > > > GridSearchCV calls "clone" on the DES estimator (link)
>     > > > The clone function calls the "get_params" function of the
>     DES estimator (link, line 60). We don't re-implement this
>     function, so it gets all the parameters, including the pool of
>     classifiers (at this point, they are still "fitted")
>     > > > The clone function then clones each parameter with
>     safe=False (line 62). When cloning the pool of classifiers, the
>     result is a pool that is not "fitted" anymore.
>     > > >
>     > > > The problem is that, to my knowledge, there is no way for my
>     classifier to inform "clone" that a parameter should be always
>     deep copied. I see that other ensemble methods in sklearn always
>     fit the base classifiers within the "fit" method of the ensemble,
>     so this problem does not happen there. I would like to know if
>     there is a solution for this problem while having the base
>     classifiers fitted elsewhere.
>     > > >
>     > > > Here is a short code that reproduces the issue:
>     > > >
>     > > > ---------------------------
>     > > >
>     > > > from sklearn.model_selection import GridSearchCV,
>     train_test_split
>     > > > from sklearn.base <http://sklearn.base> import
>     BaseEstimator, ClassifierMixin
>     > > > from sklearn.ensemble <http://sklearn.ensemble> import
>     BaggingClassifier
>     > > > from sklearn.datasets <http://sklearn.datasets> import load_iris
>     > > >
>     > > >
>     > > > class MyClassifier(BaseEstimator, ClassifierMixin):
>     > > >     def __init__(self, base_classifiers, k):
>     > > >         self.base_classifiers = base_classifiers  # Base
>     classifiers that are already trained
>     > > >         self.k = k  # Simulate a parameter that we want to
>     do a grid search on
>     > > >
>     > > >     def fit(self, X_dsel, y_dsel):
>     > > >         pass  # Here we would fit any parameters for the
>     Dynamic selection method, not the base classifiers
>     > > >
>     > > >     def predict(self, X):
>     > > >         return self.base_classifiers.predict
>     <http://classifiers.predict>(X) # In practice the methods would do
>     something with the predictions of each classifier
>     > > >
>     > > >
>     > > > X, y = load_iris(return_X_y=True)
>     > > > X_train, X_dsel, y_train, y_dsel = train_test_split(X, y,
>     test_size=0.5)
>     > > >
>     > > > base_classifiers = BaggingClassifier()
>     > > > base_classifiers.fit <http://classifiers.fit>(X_train, y_train)
>     > > >
>     > > > clf = MyClassifier(base_classifiers, k=1)
>     > > >
>     > > > params = {'k': [1, 3, 5, 7]}
>     > > > grid = GridSearchCV(clf, params)
>     > > >
>     > > > grid.fit <http://grid.fit>(X_dsel, y_dsel)  # Raises error
>     that the bagging classifiers are not fitted
>     > > >
>     > > > ---------------------------
>     > > >
>     > > > Btw, here is the branch that we are using to make the
>     library compatible with sklearn:
>     https://github.com/Menelau/DESlib/tree/sklearn-estimators. The
>     failing test related to this issue is in
>     https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_des_integration.py#L36
>     > > >
>     > > > Thanks in advance for any help on this case,
>     > > >
>     > > > Luiz Gustavo Hafemann
>     > > >
>     > > > _______________________________________________
>     > > > scikit-learn mailing list
>     > > > scikit-learn at python.org <mailto:scikit-learn at python.org>
>     > > > https://mail.python.org/mailman/listinfo/scikit-learn
>     > >
>     > >
>     > >
>     > > --
>     > > Guillaume Lemaitre
>     > > INRIA Saclay - Parietal team
>     > > Center for Data Science Paris-Saclay
>     > > https://glemaitre.github.io/
>     >
>     >
>     >
>     > --
>     > Guillaume Lemaitre
>     > INRIA Saclay - Parietal team
>     > Center for Data Science Paris-Saclay
>     > https://glemaitre.github.io/
>
>
>
>     -- 
>     Guillaume Lemaitre
>     INRIA Saclay - Parietal team
>     Center for Data Science Paris-Saclay
>     https://glemaitre.github.io/
>
>
>     ------------------------------
>
>     Subject: Digest Footer
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>     ------------------------------
>
>     End of scikit-learn Digest, Vol 30, Issue 14
>     ********************************************
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180926/e48270e4/attachment-0001.html>


More information about the scikit-learn mailing list