[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

Liam Geron liam at chatdesk.com
Wed Apr 10 15:26:55 EDT 2019


Unfortunately I don't believe that you get that level of freedom, it's an
API call that automatically calls the model's predict method so I don't
think that I get to specify something like model.predict(X).toarray(). I
could be wrong however, I don't pretend to be an expert on Cloud ML by any
stretch.

Thanks,
Liam

On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <mail at sebastianraschka.com>
wrote:

> Hm, weird that their platform seems to be so picky about it. Have you
> tried to just make the output of the pipeline dense? I.e.,
>
> (model.predict(X)).toarray()
>
> Best,
> Sebastian
>
> > On Apr 10, 2019, at 1:10 PM, Liam Geron <liam at chatdesk.com> wrote:
> >
> > Hi Sebastian,
> >
> > Thanks for the advice! The model actually works on it's own in python
> fine luckily, so I don't think that that is the issue exactly. I have tried
> rolling my own estimator to wrap the pipeline to have it call the
> predict_proba method to return a dense array, however I then came across
> the problem that I would have to have that custom estimator defined on the
> Cloud ML end, which I'm unsure how to do.
> >
> > Thanks,
> > Liam
> >
> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
> mail at sebastianraschka.com> wrote:
> > Hi Liam,
> >
> > not sure what your exact error message is, but it may also be that the
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
> returns sparse arrays. You could probably fix your issues by inserting a
> "DenseTransformer" into your pipelone (a simple class that just transforms
> an array from a sparse to a dense format). I've implemented sth like that
> that you can import or copy&paste it from here:
> >
> >
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
> >
> > The usage would then basically be
> >
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> >
> > Best,
> > Sebastian
> >
> >
> >
> >
> > > On Apr 10, 2019, at 12:25 PM, Liam Geron <liam at chatdesk.com> wrote:
> > >
> > > Hi all,
> > >
> > > I was hoping to get some guidance re: changing the result of the
> predict method of the OneVsRestClassifier to return a dense array rather
> than a sparse array, given that Google Cloud ML only accepts dense numpy
> arrays as a result of a given models predict method. Right now my model
> architecture looks like:
> > >
> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
> OneVsRestClassifier(XGBClassifier()))])
> > >
> > > Which returns a sparse array with the predict method. I saw the Stack
> Overflow post here:
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> > >
> > > which recommends overwriting the predict method with the predict_proba
> method, however I found that I can't serialize the model after doing so. I
> also have a stack overflow post here:
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> which details the specific pickling error.
> > >
> > > Is this a known issue? Is there an accepted way to convert this into a
> dense array?
> > >
> > > Thanks,
> > > Liam Geron
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190410/3680e0ab/attachment.html>


More information about the scikit-learn mailing list