[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

Wed Apr 10 13:25:56 EDT 2019

Hi all,

I was hoping to get some guidance re: changing the result of the predict
method of the OneVsRestClassifier to return a dense array rather than a
sparse array, given that Google Cloud ML only accepts dense numpy arrays as
a result of a given models predict method. Right now my model architecture
looks like:

model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
OneVsRestClassifier(XGBClassifier()))])

Which returns a sparse array with the predict method. I saw the Stack
Overflow post here:
https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba

which recommends overwriting the predict method with the predict_proba
method, however I found that I can't serialize the model after doing so. I
also have a stack overflow post here:
https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
which
details the specific pickling error.

Is this a known issue? Is there an accepted way to convert this into a
dense array?

Thanks,
Liam Geron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190410/3456adf4/attachment.html>