[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

Liam Geron liam at chatdesk.com
Thu Apr 11 13:30:56 EDT 2019


That's a great tip actually, I was unaware about the MultiOutputClassifier
option. I'll give it a try!

Thanks,
Liam


On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman <joel.nothman at gmail.com>
wrote:

> I think it's a bit weird if we're returning sparse output from
> OneVsRestClassifier.predict if it wasn't fit on sparse Y.
>
> Actually, I would be in favour of deprecating multilabel support in
> OneVsRestClassifier, since it is performing "binary relevance method" for
> multilabel, not actually OvR. MultiOutputClassifier duplicates this
> functionality (more or less), outputs a dense array (indeed it doesn't
> support sparse Y and perhaps it should) and lives closer to functional
> alternatives to binary relevance, such as ClassifierChain.
>
> On Thu, 11 Apr 2019 at 05:32, Liam Geron <liam at chatdesk.com> wrote:
>
>> Unfortunately I don't believe that you get that level of freedom, it's an
>> API call that automatically calls the model's predict method so I don't
>> think that I get to specify something like model.predict(X).toarray(). I
>> could be wrong however, I don't pretend to be an expert on Cloud ML by any
>> stretch.
>>
>> Thanks,
>> Liam
>>
>> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <
>> mail at sebastianraschka.com> wrote:
>>
>>> Hm, weird that their platform seems to be so picky about it. Have you
>>> tried to just make the output of the pipeline dense? I.e.,
>>>
>>> (model.predict(X)).toarray()
>>>
>>> Best,
>>> Sebastian
>>>
>>> > On Apr 10, 2019, at 1:10 PM, Liam Geron <liam at chatdesk.com> wrote:
>>> >
>>> > Hi Sebastian,
>>> >
>>> > Thanks for the advice! The model actually works on it's own in python
>>> fine luckily, so I don't think that that is the issue exactly. I have tried
>>> rolling my own estimator to wrap the pipeline to have it call the
>>> predict_proba method to return a dense array, however I then came across
>>> the problem that I would have to have that custom estimator defined on the
>>> Cloud ML end, which I'm unsure how to do.
>>> >
>>> > Thanks,
>>> > Liam
>>> >
>>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
>>> mail at sebastianraschka.com> wrote:
>>> > Hi Liam,
>>> >
>>> > not sure what your exact error message is, but it may also be that the
>>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
>>> returns sparse arrays. You could probably fix your issues by inserting a
>>> "DenseTransformer" into your pipelone (a simple class that just transforms
>>> an array from a sparse to a dense format). I've implemented sth like that
>>> that you can import or copy&paste it from here:
>>> >
>>> >
>>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>>> >
>>> > The usage would then basically be
>>> >
>>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
>>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>>> >
>>> > Best,
>>> > Sebastian
>>> >
>>> >
>>> >
>>> >
>>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron <liam at chatdesk.com> wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > I was hoping to get some guidance re: changing the result of the
>>> predict method of the OneVsRestClassifier to return a dense array rather
>>> than a sparse array, given that Google Cloud ML only accepts dense numpy
>>> arrays as a result of a given models predict method. Right now my model
>>> architecture looks like:
>>> > >
>>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
>>> OneVsRestClassifier(XGBClassifier()))])
>>> > >
>>> > > Which returns a sparse array with the predict method. I saw the
>>> Stack Overflow post here:
>>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
>>> > >
>>> > > which recommends overwriting the predict method with the
>>> predict_proba method, however I found that I can't serialize the model
>>> after doing so. I also have a stack overflow post here:
>>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
>>> which details the specific pickling error.
>>> > >
>>> > > Is this a known issue? Is there an accepted way to convert this into
>>> a dense array?
>>> > >
>>> > > Thanks,
>>> > > Liam Geron
>>> > > _______________________________________________
>>> > > scikit-learn mailing list
>>> > > scikit-learn at python.org
>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>> >
>>> > _______________________________________________
>>> > scikit-learn mailing list
>>> > scikit-learn at python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>> > _______________________________________________
>>> > scikit-learn mailing list
>>> > scikit-learn at python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190411/8a48e1a3/attachment.html>


More information about the scikit-learn mailing list